Section 20.1: The reality gap revisited | Building Embodied AI: From Perception to Autonomous Action

A Careful Control Loop

Technical illustration showing a simulated robot plan meeting a physical robot run, with sensor noise, contact slip, and delayed actuation making the same action behave differently. — **Figure 20.1A**: The reality gap is not one crack in the simulator. It is the combined effect of perception, dynamics, timing, contact, and evaluation mismatches showing up in the same closed loop.

Big Picture

The reality gap is the difference between the Markov decision process used for training and the physical process that receives the deployed policy. In reinforcement learning, this gap matters because a policy exploits whatever regularities make reward easy in simulation, including regularities the real robot will never provide.

For The reality gap revisited, sim-to-real transfer should name the randomized variables, simulator assumptions, real-world measurement, and demonstration-learning handoff in one transfer ledger.

This section turns the reality gap from a slogan into a set of measurable mismatches. The most common gaps are observation mismatch, transition mismatch, actuator mismatch, timing mismatch, contact mismatch, and evaluation mismatch. Each one creates a different debugging question.

The key question is practical: when a policy succeeds in simulation and fails on hardware, which interface changed enough to invalidate the learned action?

Action Is The Test

A sim-to-real policy fails for a reason that can usually be localized. Treat "the reality gap" as a failure label only temporarily, then split it into sensor, dynamics, actuator, timing, contact, and metric gaps.

Theory

A useful formalization compares the simulator transition model $P_{\text{sim}}(s_{t+1}\mid s_t,a_t)$ with the hardware transition model $P_{\text{real}}(s_{t+1}\mid s_t,a_t)$. The policy never sees these distributions directly. It experiences them as different next observations, different rewards, and different safety margins after the same nominal action.

The gap is load-bearing when it changes the policy ranking: action $a_1$ looks better than action $a_2$ in simulation, but the ordering reverses on the robot. Small parameter errors matter most when they push the policy across a contact threshold, actuator limit, sensor blind spot, or termination condition.

Mechanism

The mechanism is a mismatch cascade. A camera pose estimate is late by two frames, the policy commands torque for the old pose, the motor clips the command, the contact model overestimates friction, and the evaluator counts a brief touch as success. The robot does not see five small errors. It sees one failed rollout.

Worked Example

Code Fragment 20.1.1 below shows a tiny diagnostic for a pushing policy. It compares the same commanded push in simulation and on hardware, then labels the dominant gap instead of hiding the failure behind one success rate.

# Compare a simulated push with a hardware push using the same command.
# The gap label points to the interface that changed the rollout outcome.
sim_trace = {"slip_cm": 0.4, "settle_ms": 110, "success": True}
real_trace = {"slip_cm": 2.1, "settle_ms": 190, "success": False}

if real_trace["slip_cm"] - sim_trace["slip_cm"] > 1.0:
    gap = "contact and friction"
elif real_trace["settle_ms"] - sim_trace["settle_ms"] > 50:
    gap = "actuator delay"
else:
    gap = "evaluation or observation"

print(f"sim_success={sim_trace['success']}, real_success={real_trace['success']}")
print(f"dominant_gap={gap}")

sim_success=True, real_success=False dominant_gap=contact and friction

Code Fragment 20.1.1 turns a failed transfer into a diagnosis by comparing slip_cm, settle_ms, and success under the same command. The important move is not the threshold itself, but the habit of storing enough trace fields to name the failure mechanism.

Expected output: a useful reality-gap diagnostic reports the simulator outcome, the hardware outcome, and the suspected mismatch category. If the trace contains only final reward, the team cannot tell whether to fix sensing, dynamics, actuation, timing, or evaluation.

Library Shortcut

Use Gymnasium or Isaac Lab to enforce a common rollout schema, MuJoCo or Drake when explicit dynamics and contact assumptions must be inspected, and ROS 2 bags for hardware traces. The library shortcut is not "train and trust." It is "log the same fields in sim and real so the gap can be localized."

Practical Recipe

Write the simulated MDP assumptions: state variables, observation noise, transition parameters, actuator model, contact model, and termination rule.
Record the hardware interface with the same fields: sensor timestamps, command timestamps, controller status, measured motion, safety events, and success label.
Run paired rollouts with the same initial condition family and the same commanded policy checkpoint.
Label failures by the first interface that diverges enough to change the action outcome.
Repair the narrowest mismatch first, then rerun the paired diagnostic before changing the policy architecture.

Common Failure Mode

The common mistake is to treat sim-to-real as a single scalar transfer score. A high simulator reward can coexist with a wrong contact model, a delayed motor response, and an evaluator that rewards a state the hardware cannot safely reach.

Practical Example

A mobile manipulator that opens a drawer may fail because the simulated hinge friction is too low. The fix is not automatically more domain randomization. The first fix is a paired trace that shows whether the gripper slipped, the wrist saturated, the drawer contact stuck, or the success detector fired too early.

Memory Hook

When the reality gap revisited feels abstract, ask what would be different in the next frame of video, the next robot state, or the next safety margin.

Research Frontier

Research systems increasingly treat the reality gap as a measurement problem rather than only a robustness problem. The strongest transfer reports include videos, state logs, perturbation panels, real-robot failures, and ablations that show which simulated assumptions mattered.

Self Check

Pick one robot task and name the most likely observation gap, transition gap, actuator gap, timing gap, and evaluation gap. Which one would you test first, and what trace field would prove it?

The idea in this section becomes useful when it is tied to a closed-loop contract. For reality-gap work, the contract names the simulator assumptions, the hardware measurements, and the alignment rule that says two rollouts are comparable. Without that contract, a model can look capable in a notebook while failing the first time a sensor drops a frame or a controller saturates.

The graduate-level habit is to separate three claims. The modeling claim explains which part of $P_{\text{sim}}$ approximates $P_{\text{real}}$. The systems claim explains which observation, action, or timing interface exposes the approximation error. The evidence claim records which paired rollout would convince a skeptical builder.

Practical Tool Choices For This Section

Tool or Library	Role in the Topic	Builder Advice
Gymnasium	Common rollout API	Use it to keep reset, step, reward, and termination semantics consistent across diagnostic environments.
Isaac Lab	Robot-learning simulation	Use it when the gap involves sensors, randomized assets, parallel rollout collection, or GPU-scale task panels.
ROS 2 bags	Hardware trace capture	Use them to align observations, commands, controller states, and safety events with simulator logs.
MuJoCo	Inspectible contact and dynamics	Use it when contact parameters, inertia, actuator limits, or control latency need explicit auditing.
Drake	System modeling and identification	Use it when the transfer question depends on calibrated dynamics, constraints, and state estimation.

A robust implementation starts with a paired rollout schema. The schema should log inputs, outputs, units, timestamps, controller limits, termination reasons, and one failure label. The simulator and the robot must produce the same artifact shape, otherwise the comparison becomes a story assembled from separate experiments.

Write a one-paragraph reality-gap contract with simulator assumptions and hardware measurements.
Choose paired trace fields that can be captured in both places without manual interpretation.
Run one deterministic smoke test and one perturbation test before scaling policy training.
Save a single artifact containing configuration, seed, metrics, videos or state logs, timing traces, and failure labels.
Compare repairs only when one script evaluates them on the same task panel and hardware protocol.

When a transfer attempt fails, avoid labeling the whole policy as weak. First assign the failure to observation, transition dynamics, contact, actuator delay, controller saturation, data coverage, or evaluation. Then rerun one controlled perturbation that isolates the suspected cause. This pattern turns a disappointing rollout into a reusable diagnostic asset.

Evaluation Recipe

For reality-gap studies, compare only construct-matched metrics that are co-computed in one pass on one configuration: same policy checkpoint, same initial-condition panel, same perturbation suite, same hardware protocol, and the same success definition. Save the result as one artifact with traces, summary statistics, videos or state logs, timing measurements, and failure labels so every number in a later table is backed by the same run.

Key Takeaway

The reality gap becomes useful engineering knowledge only after it is decomposed into measurable mismatches that a team can test, repair, and retest.

Exercise 20.1.1

Choose a real robot task and write a paired trace schema with at least one observation field, one action field, one timing field, one safety field, and one suspected gap label.

What's Next?

This section turned the reality gap revisited into a testable embodied-learning contract: define the loop, choose the tool, save one comparable artifact, and diagnose failure by interface. Next, continue with Section 20.2, where the same evaluation habit carries into the next reinforcement-learning decision.

References & Further Reading

Foundational Papers, Tools, and Practice References

Tobin, J. et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. IROS.

Demonstrates that training with randomized visual and physical parameters forces policies to learn features invariant to simulator appearance, enabling direct transfer to a physical robot without fine-tuning. Read to understand the gap between visual sim-to-real and dynamics sim-to-real; this paper focuses on the visual side.

Paper

Peng, X. B. et al. (2018). Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. ICRA.

This paper shows dynamics randomization for transferring learned control policies.

Paper

Kumar, A. et al. (2021). RMA: Rapid Motor Adaptation for Legged Robots. RSS.

Introduces RMA, which separates a base policy trained with full privileged state from a lightweight adaptation module trained online from proprioception only. Read Section 3 for the two-phase training procedure; RMA is one of the clearest demonstrations that explicit adaptation at inference time outperforms domain randomization alone for legged locomotion.

Paper

Tan, J. et al. (2018). Sim-to-Real: Learning Agile Locomotion for Quadruped Robots. RSS.

This work is a clear example of transferring locomotion policies from simulation to hardware.

Paper

NVIDIA Isaac Lab documentation.

NVIDIA's GPU-accelerated robot learning framework that runs thousands of parallel environments on a single GPU. Read the documentation for task configuration, domain randomization APIs, and the sim-to-real export path; massively parallel training with Isaac Lab is how locomotion and dexterous manipulation policies achieve the sample counts needed for sim-to-real transfer.

Tool

Drake documentation.

Drake is relevant when transfer work needs explicit dynamics, constraints, and system identification.

Tool