Section 35.5: Adapting to new robots; prompting and conditioning | Building Embodied AI: From Perception to Autonomous Action

"A new robot is not a new prompt; it is a new contract with gravity, timing, and hardware limits."
An Adaptation Engineer

One shared policy spine snaps into several robot-specific adapter plates, each labeled with camera, action, and safety differences rather than with a new identity. — **Figure 35.5A:** Adapting to a new robot is mostly about building the right adapter plate between a shared prior and a new embodiment-specific contract.

Big Picture

When a robot foundation model meets a new platform, the first question is not "which prompt should I use?" It is "what changed in observations, actions, rates, and safety limits?" Prompting helps only after those interface differences are under control.

What Changes When The Robot Changes

Moving to a new robot can alter camera topology, frame conventions, control frequency, gripper state, actuator delay, and safety envelopes. Language conditioning may stay almost unchanged while the motor contract changes completely. That asymmetry is why adaptation pipelines usually begin with calibration and action remapping before they touch model weights.

The common mistake is to treat "prompting and conditioning" as if embodiment transfer were mostly a semantic problem. In reality, prompting is only one conditioning channel among many: embodiment tokens, calibration metadata, action adapters, and low-rank fine-tuning all compete to absorb different parts of the shift.

Adapt The Interface Before The Weights

If the new robot uses different frames, units, or control limits, prompt engineering is the wrong first tool. Start with the contract mismatch.

A Minimal Adaptation Equation

One useful decomposition is

$$a_t^{(r)} = A_r\big(\pi_{\theta + \Delta\theta}(E(o_t, m_r), q_t, e_r)\big),$$

where $m_r$ is embodiment metadata, $e_r$ is an embodiment token or descriptor, $A_r$ is the robot-specific action adapter, and $\Delta\theta$ is any fine-tuning update. The builder's job is to keep each term honest: metadata handles known configuration, embodiment tokens capture coarse robot identity, adapters handle command semantics, and weight updates are reserved for the part that genuinely needs learning.

Code Fragment 1 turns that decomposition into a practical decision rule.

# Choose the cheapest adaptation path that matches the interface mismatch.
cases = [
    {"robot": "same_arm_new_task", "camera_changed": False, "action_changed": False, "new_language": True},
    {"robot": "new_gripper_same_arm", "camera_changed": False, "action_changed": True, "new_language": False},
    {"robot": "mobile_manipulator", "camera_changed": True, "action_changed": True, "new_language": True},
]

for case in cases:
    if not case["camera_changed"] and not case["action_changed"]:
        decision = "prompt or small data fine-tune"
    elif case["action_changed"] and not case["camera_changed"]:
        decision = "action adapter plus validation"
    else:
        decision = "adapter, calibration, and post-training data"
    print(f"{case['robot']}: {decision}")

same_arm_new_task: prompt or small data fine-tune
new_gripper_same_arm: action adapter plus validation
mobile_manipulator: adapter, calibration, and post-training data

The expected output is a routing table that chooses the lightest adaptation lever compatible with the actual source of mismatch. The important interpretation is that prompt-only adaptation is reserved for semantic drift on familiar hardware, while embodiment shifts route immediately toward adapters, calibration, or parameter updates.

Code Fragment 1: The decision tree separates semantic novelty from embodiment novelty. A new task on the same hardware may be a prompting problem, while a new gripper or mobile base usually forces an explicit interface adaptation step.

Library Shortcut

The manual routing logic is tiny, but real adaptation workflows in OpenVLA, LeRobot, and openpi give you a maintained place to store calibration, embodiment metadata, fine-tuning configs, and evaluation results. The library path removes glue-code overhead while keeping the adaptation manifest reproducible.

Prompting, Embodiment Tokens, And LoRA Are Not Interchangeable

Which Adaptation Lever Solves Which Problem

Lever	Best for	Weakness
Prompting	Task phrasing and semantic emphasis on familiar hardware	Cannot repair wrong action scales or stale calibration
Embodiment token or descriptor	Coarse robot identity inside a shared policy	Too weak if the command semantics are fundamentally different
Action adapter	Frame, unit, and actuator differences	Still needs validation under latency and saturation
LoRA or other fine-tuning	Persistent task or embodiment gaps after interface alignment	Easy to overfit if the evaluation panel is small

Prompting Does Not Cancel Physics

If a new robot fails because the gripper saturates late or the camera frame is misregistered, no prompt will fix it. Treat prompting as a semantic tool, not as a substitute for calibration and control hygiene.

Practical Example

A lab adapting an open VLA from a tabletop arm to a mobile manipulator might keep the language head, add embodiment metadata for the new camera tree, build an action adapter for base-plus-arm commands, then use a small amount of post-training data to recover task-specific precision. That staged pipeline is far cheaper than relearning the whole stack from scratch.

Memory Hook

Prompting a miscalibrated robot to "please be accurate" is like adding manners to a broken ruler.

Self Check

For a new robot with unchanged task language but a new gripper and slower control loop, which adaptation lever comes first: prompting, embodiment token, action adapter, or weight update? Explain why in one sentence.

Research Frontier

Recent frontier systems increasingly advertise embodiment adaptation from modest numbers of demonstrations, but they vary in where the adaptation enters: motion-transfer layers, embodiment descriptors, action tokenizers, or lightweight fine-tuning. The unresolved question is which combination gives the strongest transfer without sacrificing interpretability or safety auditing.

Key Takeaway

Adapting to a new robot is a sequencing problem. First align the embodiment contract, then choose the lightest learning mechanism that closes the remaining gap.

Exercise 35.5

Write an adaptation plan for moving an open VLA from a fixed tabletop arm to a wheeled mobile manipulator. Separate what you would solve with metadata, an action adapter, prompting, and parameter updates.

What's Next?

Section 35.6 looks at the less glamorous side of the story: data scale, compute budgets, and the trade-offs between open and closed stacks when a lab has to choose where to invest effort.

Bibliography and Further Reading

Adaptation And Tooling

Physical Intelligence. "openpi" repository.

The main open reference for pi-zero family models and an important source for adaptation interfaces.

Repository

OpenVLA repository.

Useful for fine-tuning and inference interfaces around open VLA backbones.

Repository

LeRobot documentation.

A practical source for dataset, policy, and evaluation workflows on accessible hardware.

Documentation