"Human motion is not the answer; it is a clue that still has to survive embodiment."
A Retargeting Review Session
Learning from humans gives humanoids a data advantage, but only when retargeting preserves task intent while respecting contact, reach, and torque limits.
A common retargeting objective is $\min_q \|\phi_{\mathrm{human}} - \phi_{\mathrm{robot}}(q)\|_W^2 + \lambda_c C(q) + \lambda_l L(q)$, where $\phi$ encodes task-relevant pose features, $C(q)$ penalizes contact inconsistency, and $L(q)$ penalizes joint or balance-limit violations. The critical idea is that not every human detail matters equally. End-effector intent and contact timing often matter more than exact elbow angle.
HumanPlus, HOVER, OmniH2O-style work, and related motion-retargeting pipelines all confront the same embodied gap: the human demonstrator and the humanoid do not share mass distribution, joint ranges, or contact mechanics. Retargeting is therefore an inference problem, not a copy problem.
Good retargeting preserves what the human was trying to accomplish, not every raw joint angle from the original motion.
Theory
The right retargeting features depend on the task. For locomotion, center-of-mass timing and foot contacts matter. For manipulation, hand pose, gaze, and object-relative trajectories matter. For loco-manipulation, all of them matter together.
This is why motion datasets alone are not enough. A good dataset carries timing, contact, object state, and sometimes force cues so the retargeter can distinguish stylistic variation from essential task structure.
Evaluation should therefore include both geometric metrics and executable metrics: pose similarity, contact timing agreement, balance margin, torque peaks, and actual task completion.
- Capture human motion and task context, including objects and contact timing if possible.
- Choose task-relevant features rather than copying all joints equally.
- Solve the retargeting objective under joint, balance, and contact constraints.
- Replay on the robot or simulator and log feasibility violations and timing drift.
- If the motion is not executable, revise the feature set before blaming the controller.
Worked Example
A small retargeting ledger can already separate good task-intent preservation from geometric overfitting.
human_features = {"left_hand_to_box_cm": 4.0, "right_foot_contact": 1, "torso_yaw_deg": 18}
robot_trial = {"left_hand_to_box_cm": 5.3, "right_foot_contact": 1, "torso_yaw_deg": 15}
errors = {
"hand_error_cm": round(abs(human_features["left_hand_to_box_cm"] - robot_trial["left_hand_to_box_cm"]), 1),
"contact_match": int(human_features["right_foot_contact"] == robot_trial["right_foot_contact"]),
"yaw_error_deg": abs(human_features["torso_yaw_deg"] - robot_trial["torso_yaw_deg"]),
}
print(errors)
Expected output interpretation. The hand and torso errors are small while the contact event is preserved. That suggests the retargeting kept task intent and support timing, which matters more than exact whole-body imitation for many tasks.
Use motion-retargeting pipelines, whole-body simulators, and robot-data stacks such as LeRobot to keep demonstration and execution artifacts synchronized.
Practical Recipe
- Select the task features that actually matter before collecting imitation data.
- Record contact timing and object state whenever possible.
- Retarget with explicit feasibility penalties.
- Evaluate on execution metrics, not only geometric similarity.
- Keep failed motions as diagnostics because they reveal missing embodiment features.
A visually plausible retargeted motion can still be dynamically impossible, unsafe, or task-irrelevant for the robot body.
A human can lean and twist to place a box on a shelf while compensating with subtle ankle control. A humanoid with different hip or ankle limits may need a step adjustment rather than a direct pose imitation.
The robot is not a puppet. It is an organism with different bones, muscles, and excuses.
Recent work pushes from motion tracking toward video-driven, object-aware whole-body learning and motion priors that fill gaps between sparse demonstrations. The open problem is preserving intent under large embodiment mismatch.
"HumanPlus: Humanoid Shadowing and Imitation from Humans" (Fu et al., RSS 2024) demonstrates whole-body humanoid imitation from egocentric video. More than 40 skills are trained from approximately 40 hours of human demonstration data. The key contribution is a shadowing pipeline that maps egocentric human motion into real-time humanoid control without motion-capture suits, making large-scale human demonstration collection practical for dexterous manipulation and loco-manipulation tasks.
Which feature would you preserve first for a carry task: hand trajectory, foot contacts, torso orientation, or joint angles, and why?
This section is useful for teaching the distinction between imitation and embodiment. Students often begin by assuming the goal is faithful visual copying. The real goal is executable task transfer.
It is also a natural place to introduce data contracts. Demonstration data becomes much more valuable when it records task context and contact semantics rather than only pose streams.
| Tool or Library | Role in the Topic | Builder Advice |
|---|---|---|
| LeRobot-style data tooling | Store demonstrations with synchronized metadata | Keep contact and object state beside pose data. |
| Whole-body simulators | Check executability before hardware rollout | Reject motions that only look right in kinematics space. |
| Retargeting pipelines | Map human features into robot features | Tune feature weighting by task, not by generic motion similarity. |
This section connects to robot datasets, teleoperation, and cross-embodiment learning.
Retarget one short human demonstration into a humanoid simulation, then compare raw pose error against task-feature error and balance feasibility.
When retargeting fails, ask whether the missing piece is feature choice, contact semantics, embodiment mismatch, or controller feasibility. Different failures imply different dataset improvements.
Section References
HumanPlus project page. https://humanplus.github.io/
Primary current source for human-motion-driven humanoid control.
HOVER project page. https://www.hover-policy.org/
Current reference for versatile neural whole-body control.
LeRobot documentation. https://huggingface.co/docs/lerobot/en/index
Practical stack for storing and training from robot demonstrations.
The purpose of human data is not mimicry. It is executable task transfer under a different body.
Define a retargeting evaluation for a shelf-placement task. Include one geometric metric, one contact metric, one balance metric, and one task-completion metric.