"A new robot is not a new prompt; it is a new contract with gravity, timing, and hardware limits."
An Adaptation Engineer
When a robot foundation model meets a new platform, the first question is not "which prompt should I use?" It is "what changed in observations, actions, rates, and safety limits?" Prompting helps only after those interface differences are under control.
What Changes When The Robot Changes
Moving to a new robot can alter camera topology, frame conventions, control frequency, gripper state, actuator delay, and safety envelopes. Language conditioning may stay almost unchanged while the motor contract changes completely. That asymmetry is why adaptation pipelines usually begin with calibration and action remapping before they touch model weights.
The common mistake is to treat "prompting and conditioning" as if embodiment transfer were mostly a semantic problem. In reality, prompting is only one conditioning channel among many: embodiment tokens, calibration metadata, action adapters, and low-rank fine-tuning all compete to absorb different parts of the shift.
If the new robot uses different frames, units, or control limits, prompt engineering is the wrong first tool. Start with the contract mismatch.
A Minimal Adaptation Equation
One useful decomposition is
$$a_t^{(r)} = A_r\big(\pi_{\theta + \Delta\theta}(E(o_t, m_r), q_t, e_r)\big),$$
where $m_r$ is embodiment metadata, $e_r$ is an embodiment token or descriptor, $A_r$ is the robot-specific action adapter, and $\Delta\theta$ is any fine-tuning update. The builder's job is to keep each term honest: metadata handles known configuration, embodiment tokens capture coarse robot identity, adapters handle command semantics, and weight updates are reserved for the part that genuinely needs learning.
Code Fragment 1 turns that decomposition into a practical decision rule.
# Choose the cheapest adaptation path that matches the interface mismatch.
cases = [
{"robot": "same_arm_new_task", "camera_changed": False, "action_changed": False, "new_language": True},
{"robot": "new_gripper_same_arm", "camera_changed": False, "action_changed": True, "new_language": False},
{"robot": "mobile_manipulator", "camera_changed": True, "action_changed": True, "new_language": True},
]
for case in cases:
if not case["camera_changed"] and not case["action_changed"]:
decision = "prompt or small data fine-tune"
elif case["action_changed"] and not case["camera_changed"]:
decision = "action adapter plus validation"
else:
decision = "adapter, calibration, and post-training data"
print(f"{case['robot']}: {decision}")
same_arm_new_task: prompt or small data fine-tune new_gripper_same_arm: action adapter plus validation mobile_manipulator: adapter, calibration, and post-training data
The expected output is a routing table that chooses the lightest adaptation lever compatible with the actual source of mismatch. The important interpretation is that prompt-only adaptation is reserved for semantic drift on familiar hardware, while embodiment shifts route immediately toward adapters, calibration, or parameter updates.
The manual routing logic is tiny, but real adaptation workflows in OpenVLA, LeRobot, and openpi give you a maintained place to store calibration, embodiment metadata, fine-tuning configs, and evaluation results. The library path removes glue-code overhead while keeping the adaptation manifest reproducible.
Prompting, Embodiment Tokens, And LoRA Are Not Interchangeable
| Lever | Best for | Weakness |
|---|---|---|
| Prompting | Task phrasing and semantic emphasis on familiar hardware | Cannot repair wrong action scales or stale calibration |
| Embodiment token or descriptor | Coarse robot identity inside a shared policy | Too weak if the command semantics are fundamentally different |
| Action adapter | Frame, unit, and actuator differences | Still needs validation under latency and saturation |
| LoRA or other fine-tuning | Persistent task or embodiment gaps after interface alignment | Easy to overfit if the evaluation panel is small |
If a new robot fails because the gripper saturates late or the camera frame is misregistered, no prompt will fix it. Treat prompting as a semantic tool, not as a substitute for calibration and control hygiene.
A lab adapting an open VLA from a tabletop arm to a mobile manipulator might keep the language head, add embodiment metadata for the new camera tree, build an action adapter for base-plus-arm commands, then use a small amount of post-training data to recover task-specific precision. That staged pipeline is far cheaper than relearning the whole stack from scratch.
Prompting a miscalibrated robot to "please be accurate" is like adding manners to a broken ruler.
For a new robot with unchanged task language but a new gripper and slower control loop, which adaptation lever comes first: prompting, embodiment token, action adapter, or weight update? Explain why in one sentence.
Recent frontier systems increasingly advertise embodiment adaptation from modest numbers of demonstrations, but they vary in where the adaptation enters: motion-transfer layers, embodiment descriptors, action tokenizers, or lightweight fine-tuning. The unresolved question is which combination gives the strongest transfer without sacrificing interpretability or safety auditing.
Adapting to a new robot is a sequencing problem. First align the embodiment contract, then choose the lightest learning mechanism that closes the remaining gap.
Write an adaptation plan for moving an open VLA from a fixed tabletop arm to a wheeled mobile manipulator. Separate what you would solve with metadata, an action adapter, prompting, and parameter updates.
What's Next?
Section 35.6 looks at the less glamorous side of the story: data scale, compute budgets, and the trade-offs between open and closed stacks when a lab has to choose where to invest effort.
Physical Intelligence. "openpi" repository.
The main open reference for pi-zero family models and an important source for adaptation interfaces.
Useful for fine-tuning and inference interfaces around open VLA backbones.
A practical source for dataset, policy, and evaluation workflows on accessible hardware.