Section 20.3: Domain randomization, system identification, adaptation (RMA) | Building Embodied AI: From Perception to Autonomous Action

A Careful Control Loop

Technical illustration showing a robot policy trained across many simulated dynamics, then narrowed by system identification and adjusted online by an adaptation module. — **Figure 20.3A**: Domain randomization widens the training world, system identification centers it on the real robot, and adaptation handles what still changes during deployment.

Big Picture

Domain randomization, system identification, residual randomization, and rapid motor adaptation are complementary answers to the same problem: the simulator parameters are wrong, but wrong in ways we can structure. Randomization trains robustness, system identification estimates the real parameters, residual randomization samples uncertainty around that estimate, and adaptation infers unobserved dynamics online.

For Domain randomization, system identification, adaptation (RMA), sim-to-real transfer should name the randomized variables, simulator assumptions, real-world measurement, and demonstration-learning handoff in one transfer ledger.

This section develops the technical contract for dynamics randomization. The parameters of interest include mass, inertia, center of mass, joint damping, motor strength, actuator delay, contact friction, restitution, sensor noise, and terrain geometry. Each parameter should have a reason for its range.

The key question is practical: should we make the training distribution wider, identify the real parameter more carefully, or build an adaptation module that tracks the parameter during rollout?

Action Is The Test

Blindly widening every randomization range is not robustness. It can teach a conservative policy that survives everything by doing too little. The useful range is wide enough to cover plausible reality and narrow enough to preserve the task's control structure.

Theory

Let $\theta$ denote simulator dynamics parameters such as mass, friction, damping, and delay. Domain randomization trains a policy over $\theta \sim p_{\text{train}}(\theta)$ rather than one fixed simulator. System identification estimates $\hat{\theta}$ from hardware traces. Residual randomization then trains or evaluates over $\theta \sim p(\theta\mid \hat{\theta}, \Sigma)$, where $\Sigma$ represents what the measurement still cannot pin down.

Rapid motor adaptation adds an online inference loop. A base policy receives observations and an adaptation vector $z_t$ inferred from recent state-action history. If the floor becomes slippery or the payload changes, $z_t$ should move before the robot falls, giving the same base policy a different dynamics context.

Mechanism

The mechanism has three layers. Training variation teaches the policy not to depend on a single parameter setting. Identification pulls the distribution toward the measured robot. Adaptation handles remaining changes, such as battery voltage, surface friction, payload, damage, and temperature.

Worked Example

Code Fragment 20.3.1 computes residual randomization ranges after a simple identification pass. The measured robot centers the range, while uncertainty keeps the policy from becoming brittle.

# Center randomization around identified hardware parameters.
# Residual width records uncertainty instead of pretending calibration is exact.
identified = {"friction": 0.62, "motor_scale": 0.91, "delay_ms": 34}
residual_width = {"friction": 0.08, "motor_scale": 0.05, "delay_ms": 8}

for parameter, center in identified.items():
    width = residual_width[parameter]
    low = center - width
    high = center + width
    print(f"{parameter}: sample from [{low:.2f}, {high:.2f}]")

friction: sample from [0.54, 0.70] motor_scale: sample from [0.86, 0.96] delay_ms: sample from [26.00, 42.00]

Code Fragment 20.3.1 builds residual randomization ranges for friction, motor_scale, and delay_ms. The ranges are centered on identified hardware values, which is more targeted than sampling every parameter from a broad hand-written interval.

Expected output: each randomized dynamics parameter has a center and a residual width. If the width is not justified by measurement noise, hardware variation, or unmodeled effects, the randomization range is a guess rather than an experimental design choice.

Library Shortcut

In practical systems, Isaac Lab and MuJoCo make dynamics randomization explicit, while Drake is useful for system identification and model-based checks. RSL-RL and rl_games can train policies across many randomized environments, but the critical artifact is the parameter manifest: what was randomized, why, over what range, and with what real-robot evidence.

Practical Recipe

Start with a parameter manifest: mass, inertia, friction, damping, motor strength, actuator delay, sensor noise, and terrain properties.
Mark each parameter as measured, estimated, randomized, adapted online, or held fixed.
Use system identification to center parameters that can be measured from hardware traces.
Apply residual randomization only to the uncertainty left after identification.
Evaluate policies on held-out dynamics combinations, then run a small real-robot gate before scaling hardware trials.

Common Failure Mode

The common mistake is "randomize everything" without a parameter audit. Overly broad randomization can produce a policy that avoids useful contact, moves slowly, or learns a behavior tuned to the average of impossible robots.

Practical Example

A quadruped team may identify motor strength and actuator delay on the real robot, randomize friction and terrain height because they vary by deployment site, and adapt online to payload shifts. Reporting those choices is more useful than saying the policy used domain randomization.

Memory Hook

Treat domain randomization, system identification, and RMA like a control-room label. If the label does not tell a future debugger what was sampled, what was measured, and what was adapted, it is decoration rather than engineering knowledge.

Research Frontier

The frontier is moving from broad hand-tuned randomization toward measured and adaptive transfer. Important open problems include learning which parameters matter for a task, estimating them from short hardware traces, and preventing adaptation modules from compensating for unsafe policy behavior.

Self Check

For a robot policy you know, list five dynamics parameters. Which are measured, which are randomized, which are adapted online, and which are unsafe to randomize without a safety gate?

The idea in this section becomes useful when it is tied to a parameter contract. The contract names each dynamics parameter, its unit, its source, its training distribution, its hardware estimate, and the reason its range is safe. Without that contract, a robust policy can be impossible to reproduce or diagnose.

The graduate-level habit is to separate three claims. The robustness claim says training variation helps the policy tolerate dynamics change. The identification claim says hardware traces support a parameter estimate. The adaptation claim says recent rollout history contains enough information to infer the remaining latent dynamics.

Practical Tool Choices For This Section

Tool or Library	Role in the Topic	Builder Advice
Isaac Lab	Large-scale dynamics randomization	Use it to train across parameter panels while keeping the randomization manifest explicit.
MuJoCo	Transparent dynamics parameters	Use it when mass, inertia, joint damping, contact friction, and actuator settings must be inspected directly.
Drake	System identification	Use it when measured trajectories should update a physics model rather than only widen randomization.
RSL-RL	Locomotion policy training	Use it when many randomized environments are needed for robust legged control experiments.
ROS 2 bags	Identification traces	Use them to capture time-aligned command, state, contact, and actuator data from hardware.

A robust implementation starts with the parameter manifest, not the training command. The manifest states which parameters are randomized, which are identified from hardware, which have residual uncertainty, and which are inferred online by the adaptation module.

Write parameter ranges with units and evidence source.
Run system identification for measurable quantities before widening ranges by hand.
Train with held-out dynamics combinations that were not used for policy updates.
Evaluate residual randomization separately from broad randomization.
For RMA-style policies, log the adaptation vector and correlate it with measured terrain, payload, or actuator changes.

When a randomized policy fails, first check whether the failing parameter was outside the training range, inside the range but underrepresented, measured incorrectly, or hidden from the adaptation module. Each answer leads to a different repair: widen, resample, identify, or change the adaptation input.

Evaluation Recipe

For dynamics randomization studies, compare only construct-matched metrics that are co-computed in one pass on one configuration: same policy checkpoint, same parameter panel, same identified hardware center, same residual widths, same seed set, and the same real-robot gate. Save the parameter manifest with the rollout traces so every robustness claim is backed by the same run.

Key Takeaway

Domain randomization builds robustness, system identification centers the simulator, residual randomization preserves uncertainty, and adaptation tracks what changes after deployment.

Exercise 20.3.1

Create a five-row dynamics manifest for a robot task. For each parameter, state whether it is measured, randomized, residual-randomized, adapted online, or held fixed, then justify the choice.

What's Next?

This section turned domain randomization, system identification, adaptation (RMA) into a testable embodied-learning contract: define the loop, choose the tool, save one comparable artifact, and diagnose failure by interface. Next, continue with Section 20.4, where the same evaluation habit carries into the next reinforcement-learning decision.

References & Further Reading

Foundational Papers, Tools, and Practice References

Tobin, J. et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. IROS.

Demonstrates that training with randomized visual and physical parameters forces policies to learn features invariant to simulator appearance, enabling direct transfer to a physical robot without fine-tuning. Read to understand the gap between visual sim-to-real and dynamics sim-to-real; this paper focuses on the visual side.

Paper

Peng, X. B. et al. (2018). Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. ICRA.

This paper shows dynamics randomization for transferring learned control policies.

Paper

Kumar, A. et al. (2021). RMA: Rapid Motor Adaptation for Legged Robots. RSS.

Introduces RMA, which separates a base policy trained with full privileged state from a lightweight adaptation module trained online from proprioception only. Read Section 3 for the two-phase training procedure; RMA is one of the clearest demonstrations that explicit adaptation at inference time outperforms domain randomization alone for legged locomotion.

Paper

Tan, J. et al. (2018). Sim-to-Real: Learning Agile Locomotion for Quadruped Robots. RSS.

This work is a clear example of transferring locomotion policies from simulation to hardware.

Paper

NVIDIA Isaac Lab documentation.

NVIDIA's GPU-accelerated robot learning framework that runs thousands of parallel environments on a single GPU. Read the documentation for task configuration, domain randomization APIs, and the sim-to-real export path; massively parallel training with Isaac Lab is how locomotion and dexterous manipulation policies achieve the sample counts needed for sim-to-real transfer.

Tool

Drake documentation.

Drake is relevant when transfer work needs explicit dynamics, constraints, and system identification.

Tool