Section 9.3: Fidelity: physical, visual, behavioral

"Realism is not a volume knob. It is a mixing board with labels you should read before touching anything."

A Calibration-Minded AI Agent
Technical illustration for Section 9.3: Fidelity: physical, visual, behavioral.
Figure 9.3A: A fidelity spectrum from coarse rigid-body physics to full photorealistic rendering, with each level annotated by its rendering engine, physics solver, and the policy type that benefits most.
Big Picture

Fidelity means agreement between simulation and the world along a specific axis. Physical fidelity, visual fidelity, behavioral fidelity, sensor fidelity, and task fidelity answer different questions.

For Fidelity: physical, visual, behavioral, connect the agent-environment boundary, dynamics assumptions, and transfer checks through the simulator artifact actually used in the experiment.

Fidelity Is Axis-Specific

Physical fidelity concerns dynamics: mass, inertia, friction, contacts, compliance, and actuation. Visual fidelity concerns rendered images, lighting, materials, camera models, and occlusions. Behavioral fidelity concerns whether the environment responds in ways that matter for the task: doors open, objects move, receptacles contain, fluids pour, and failure states persist.

A visually simple MuJoCo model can be enough for torque-control research. A perception policy may need Omniverse Replicator or BlenderProc imagery with camera artifacts. A household agent may need Habitat, AI2-THOR, ProcTHOR, BEHAVIOR, or OmniGibson-style object semantics because the claim depends on scene interaction, not just contact physics.

Realistic For What?

High visual fidelity cannot rescue wrong contact physics, and accurate contact physics cannot rescue a sensor model that gives the policy information the real robot never observes.

Fidelity Axes In Embodied AI
AxisWhat It ModelsWhere It Matters
PhysicalMass, contacts, friction, actuation, delayManipulation, locomotion, grasp stability, pushing
VisualLighting, textures, camera intrinsics, occlusionVision policies, detection, segmentation, pose estimation
SensorNoise, dropout, blur, calibration, frame timingState estimation, navigation, visual servoing
BehavioralObject affordances, state changes, task semanticsHousehold tasks, long-horizon planning, language grounding

Worked Miniature: A Fidelity Match

Code Fragment 9.3.1 is a simple fidelity checklist. It maps task needs to simulator capabilities so the team can defend why a tool is sufficient for a particular claim.

# Match simulator capabilities to the task's transfer risks.
# The output exposes unsupported fidelity axes before training begins.
task_needs = {"contact", "depth_noise", "object_state"}
simulators = {
    "MuJoCo": {"contact", "actuation"},
    "Isaac Lab": {"contact", "depth_noise", "camera_rendering"},
    "ProcTHOR": {"object_state", "layout_diversity", "camera_rendering"},
}

for name, capabilities in simulators.items():
    missing = sorted(task_needs - capabilities)
    status = "ready" if not missing else f"missing {missing}"
    print(name, status)
MuJoCo missing ['depth_noise', 'object_state']
Isaac Lab missing ['object_state']
ProcTHOR missing ['contact', 'depth_noise']
Code Fragment 9.3.1: This checklist compares task-specific fidelity needs against simulator capabilities instead of asking which simulator is most realistic overall. The missing list becomes the experiment's transfer-risk ledger.

Expected output: the trace identifies which simulator capabilities are missing for the task contract. That missing list is not a rejection of the simulator. It is the list of reality-gap assumptions that the experiment must either measure, randomize, or exclude from the claim.

Library Shortcut

The checklist is about 12 lines. In practice, simulator choice should become a versioned artifact beside the experiment config, using Isaac Lab, MuJoCo, ManiSkill, robosuite, Habitat, or ProcTHOR documentation to record supported physics, sensors, assets, and task semantics. The hand checklist is useful because it prevents tool choice by reputation alone.

Choosing The Necessary Fidelity

For a legged locomotion policy, the most transfer-critical mismatches may be ground contact, actuator delay, IMU noise, terrain variation, and controller frequency. For a vision-based grasp detector, the critical mismatches may be depth holes, reflective materials, occlusions, camera calibration, and object shape. For a household agent, object state and task semantics can matter more than photorealistic walls.

  1. Write the decision the agent must make.
  2. Name the simulator mismatch that could change that decision.
  3. Choose the fidelity axis that controls the mismatch.
  4. Measure the mismatch directly, or record why it is outside the current claim.
  5. Keep the claim as narrow as the supported fidelity.
Simulation Hypothesis Ledger

For Fidelity: physical, visual, behavioral, a simulator run becomes evidence only after the falsifiable hypothesis, held-out seeds, perturbation panel, and untested real-world assumption are written down.

Vague Realism Trap

The phrase high fidelity is incomplete unless it names the axis. A benchmark can be visually rich and physically weak, or physically precise and behaviorally too simple for a household claim.

Practical Example

For a mobile manipulator in a kitchen, a team might use ProcTHOR or Habitat-style scenes to study navigation and object layout, then MuJoCo or Isaac Lab for contact-rich grasping. The split is defensible only if the evaluation artifact states which construct each simulator measures.

Memory Hook

A beautiful simulation with the wrong friction is a glossy brochure for a skill the robot does not have.

Research Frontier

OpenUSD pipelines and synthetic-data tools are making it easier to combine rendered sensor fidelity with robot-learning environments. The frontier challenge is keeping physics, semantics, and evaluation labels synchronized as assets move across tools.

Self Check

Name the fidelity axis that matters most for your task. If you cannot choose one, write the decision that the policy must make, then ask which simulated mismatch would change that decision.

Fidelity: physical, visual, behavioral becomes useful when it is tied to a closed-loop contract. In this chapter on Why Simulation Is Central, the contract names the observation stream, the state estimate, the action representation, the timing budget, and the evaluation artifact. Without that contract, a model can look capable in a notebook while failing the first time a sensor drops a frame or a controller saturates.

For Fidelity: physical, visual, behavioral, separate the conceptual claim, the systems claim, and the evidence claim. A plausible mechanism, a clean interface, and a closed-loop result are different claims; the section should keep their evidence separate.

Practical Tool Choices For This Section
Tool or LibraryRole in the TopicBuilder Advice
GymnasiumFidelity: physical, visual, behavioralUse it when the experiment needs a maintained implementation rather than custom glue.
PettingZooFidelity: physical, visual, behavioralUse it when the experiment needs a maintained implementation rather than custom glue.
ROS 2Fidelity: physical, visual, behavioralUse it when the experiment needs a maintained implementation rather than custom glue.
MuJoCoFidelity: physical, visual, behavioralUse it when the experiment needs a maintained implementation rather than custom glue.
LeRobotFidelity: physical, visual, behavioralUse it when the experiment needs a maintained implementation rather than custom glue.

For Fidelity: physical, visual, behavioral, start with a small baseline that logs inputs, outputs, units, timestamps, and termination conditions before moving to Gymnasium or PettingZoo. The library run should keep the same artifact schema, so the comparison remains a same-task evaluation.

  1. Write a one-paragraph task contract with observation, action, success, and failure fields.
  2. Start with the smallest simulator, dataset, or wrapper that exposes the task contract faithfully.
  3. Run one deterministic smoke test and one perturbation test before scaling.
  4. Save a single result artifact containing configuration, seed, metrics, videos or traces, and failure labels.
  5. Compare methods only when one script evaluates them on the same task panel.

When an experiment about fidelity: physical, visual, behavioral fails, avoid labeling the whole method as weak. First assign the failure to perception, state estimation, planning, control, timing, data coverage, or evaluation. Then rerun one controlled perturbation that isolates the suspected cause. This pattern turns a disappointing rollout into a reusable diagnostic asset.

Key Takeaway

Fidelity is meaningful only when tied to a task decision and a measurable mismatch.

Exercise 9.3.1

Create a fidelity matrix for a drone landing task. Include physical, visual, sensor, and task fidelity, then mark which mismatches would invalidate a success claim.

What's Next?

Section 9.4 turns simulator mismatch into a paired sim-real measurement.

Bibliography and Further Reading
Foundational Papers

Todorov, E., Erez, T., and Tassa, Y. (2012). "MuJoCo: A physics engine for model-based control." IROS.

This paper anchors the simulator design lineage behind much modern robot learning. It is useful here because it explains why fast, controllable simulation became central to model-based control and policy testing. Readers should connect this source to fidelity: physical, visual, behavioral when deciding what is reusable, what is benchmark-specific, and what must be remeasured.

Paper

Brockman, G. et al. (2016). "OpenAI Gym." arXiv.

The Gym paper explains the environment API that shaped modern reinforcement-learning experimentation. Readers should use it to understand why reset, step, render, and reward contracts became standard research infrastructure. Readers should connect this source to fidelity: physical, visual, behavioral when deciding what is reusable, what is benchmark-specific, and what must be remeasured.

Paper
Tools And Libraries

Farama Foundation. "Gymnasium Documentation."

Gymnasium is the maintained successor interface for single-agent reinforcement-learning environments. It matters in this chapter because simulation evidence depends on reproducible environment boundaries and seed handling. Readers should connect this source to fidelity: physical, visual, behavioral when deciding what is reusable, what is benchmark-specific, and what must be remeasured.

Tool

NVIDIA. "Isaac Lab Documentation."

Isaac Lab documents a modern robot-learning workflow on top of Isaac Sim. Practitioners should read it when simulation must include vectorized tasks, assets, sensors, and learning-library integration. Readers should connect this source to fidelity: physical, visual, behavioral when deciding what is reusable, what is benchmark-specific, and what must be remeasured.

Tool
Foundational Papers

Peng, X. B., Andrychowicz, M., Zaremba, W., and Abbeel, P. (2018). "Sim-to-Real Transfer of Robotic Control with Dynamics Randomization." ICRA.

This work shows how randomized dynamics can train policies that tolerate physical mismatch. It is a useful bridge from this chapter into later transfer and domain randomization chapters. Readers should connect this source to fidelity: physical, visual, behavioral when deciding what is reusable, what is benchmark-specific, and what must be remeasured.

Paper