Part III: Simulation, Tooling, and the Modern Stack | Building Embodied AI: From Perception to Autonomous Action

Part Overview

This part covers the simulators, environments, benchmarks, and synthetic-data practices used to build embodied systems today. It connects formal ideas with the tools and labs needed to build working systems.

Chapters: 5. Each chapter includes theory, recipes, practical code, a library shortcut, and exercises.

Why This Part Matters

Simulation, Tooling, and the Modern Stack gives the reader a working layer of the embodied AI stack. Later chapters assume this layer when agents must perceive, plan, act, and recover from mistakes.

Part III Production Spine

Part III turns the foundations from Part II into the tool stack used by modern embodied AI experiments. The through-line is deliberate: simulation economics, environment interfaces, physics simulators, benchmark design, and synthetic variation.

Part III Evidence Rule

A result belongs in a paper-facing table only when the compared numbers are co-computed in one pass on one configuration. Diagnostic runs can explore freely, but benchmark claims need shared panels, seeds, splits, and metric definitions.

Chapter 9 Why Simulation Is Central

This chapter explains why simulation changes the cost, safety, and diagnosis profile of embodied learning before hardware trials begin.

9.1 Why real-world learning is slow, costly, and risky
9.2 Simulation as data generator, testbed, and curriculum
9.3 Fidelity: physical, visual, behavioral
9.4 The reality gap as a measurable quantity
9.5 The landscape of benchmark environments

Chapter 10 Environments with Gymnasium (and PettingZoo)

This chapter teaches the Gymnasium and PettingZoo contracts that make observations, actions, rewards, termination, wrappers, and multi-agent turns reproducible.

10.1 Gym is dead; Gymnasium is the standard
10.2 Observation and action spaces
10.3 Reward design and termination
10.4 Vectorized environments; wrappers
10.5 Rendering, logging, and debugging
10.6 Evaluation protocol and seeding
10.7 PettingZoo for multi-agent

Chapter 11 Physics Simulators: MuJoCo, MJX, Isaac Lab, Genesis

This chapter compares MuJoCo, MJX, Isaac Lab, Genesis, Newton, Drake, SAPIEN, and Gazebo by task fit rather than popularity.

11.1 What physics simulators model (bodies, joints, contacts, friction)
11.2 MuJoCo and the MJCF/URDF model formats
11.3 MuJoCo MJX and MuJoCo Warp: massively parallel and differentiable
11.4 NVIDIA Isaac Sim + Isaac Lab; the Isaac Gym -> Isaac Lab migration
11.5 The Newton physics engine and OpenUSD scene interchange
11.6 Genesis and generative multi-physics
11.7 Drake, SAPIEN, ROS 2, and Gazebo
11.8 Choosing a simulator

Chapter 12 Benchmarks and Task Suites

This chapter shows how task suites, leaderboards, construct-matched metrics, and failure labels turn simulator runs into defensible evidence.

12.1 Why standardized benchmarks matter
12.2 Manipulation: ManiSkill3, robosuite, RoboCasa, robomimic, RLBench
12.3 Lifelong and language-conditioned: LIBERO, CALVIN, Meta-World
12.4 Household and long-horizon: BEHAVIOR-1K / OmniGibson
12.5 Navigation and social: Habitat 3.0, AI2-THOR / ProcTHOR
12.6 Reading a leaderboard without fooling yourself

Chapter 13 Domain Randomization and Synthetic Data

This chapter designs synthetic variation, domain randomization, real2sim2real loops, and transfer audits as one measurable workflow.

13.1 Why synthetic variation matters
13.2 Visual, physics, sensor, and task randomization
13.3 Curriculum and automatic randomization
13.4 Photoreal rendering and tiled cameras
13.5 real2sim2real and asset/scene reconstruction
13.6 Randomization vs. realism; measuring transfer readiness

What's Next?

After this part, Part IV: Reinforcement Learning for Embodied Agents extends the stack.