Table of Contents

A hands-on science guide to embodied agents, robot learning, simulation, world models, and autonomous action.

Second Edition · 2026

12 parts · 60 chapters · 379 sections, plus front matter and 9 appendices. Every chapter and section linked below is generated and live.

Front Matter · Opening Material

9 entries
  1. F1
    ForewordFront matter for Building Embodied AI: From Perception to Autonomous Action.
    front-matter/foreword.html
  2. F2
    About the AuthorsFront matter for Building Embodied AI: From Perception to Autonomous Action.
    front-matter/about-authors.html
  3. F3
    About the Hands-On AI Science SeriesThe series promise and why Embodied AI is the fifth volume.
    front-matter/about-the-series.html
  4. F4
    Who Should Read This BookFront matter for Building Embodied AI: From Perception to Autonomous Action.
    front-matter/fm-who-should-read.html
  5. F5
    How to Use This BookFront matter for Building Embodied AI: From Perception to Autonomous Action.
    front-matter/fm-how-to-use.html
  6. F6
    What This Book CoversFront matter for Building Embodied AI: From Perception to Autonomous Action.
    front-matter/fm-what-this-book-covers.html
  7. F7
    Look Inside PreviewFront matter for Building Embodied AI: From Perception to Autonomous Action.
    front-matter/look-inside-preview.html
  8. F8
    Application Reader PathwaysApplication-specific pathways through the book.
    front-matter/application-reader-pathways.html
  9. F9
    Copyright and LegalFront matter for Building Embodied AI: From Perception to Autonomous Action.
    front-matter/copyright.html

Part I · Foundations of Embodied AI

3 chapters · 24 sections

The conceptual vocabulary of agents, environments, embodiment, and closed-loop intelligence.

  1. 1
    From Static AI to Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 1.1 Static prediction vs. embodied interaction
    2. 1.2 Why intelligence needs a world; the perception-action loop
    3. 1.3 Agents, environments, observations, actions, rewards, constraints
    4. 1.4 Physical vs. simulated embodiment
    5. 1.5 The "Physical AI" framing and why 2023-2026 changed the field
    6. 1.6 Examples: vacuum, drone, autonomous vehicle, manipulator, humanoid, game agent
    7. 1.7 Why embodied AI is hard (partial observability, long horizons, safety, data cost)
    8. 1.8 Map of the book
    part-1-foundations-of-embodied-ai/module-01-from-static-ai-to-embodied-ai/
  2. 2
    The Agent-Environment Interface Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 2.1 Agents and environments formally
    2. 2.2 State, observation, hidden variables, partial observability
    3. 2.3 Action types: discrete, continuous, symbolic, motor-level, chunked
    4. 2.4 Rewards, goals, costs, constraints
    5. 2.5 Episodes, horizons, trajectories, discounting
    6. 2.6 Markov decision processes; Bellman equations
    7. 2.7 Partially observable MDPs; belief states
    8. 2.8 Why embodiment is usually partially observable
    part-1-foundations-of-embodied-ai/module-02-the-agent-environment-interface/
  3. 3
    Embodied System Architectures Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 3.1 The canonical stack: sense, perceive, estimate, predict, plan, control, act
    2. 3.2 Classical modular robotics pipeline
    3. 3.3 End-to-end learned policy pipeline
    4. 3.4 Hybrid and hierarchical architectures
    5. 3.5 Reactive vs. deliberative agents
    6. 3.6 Dual-system (System 1 / System 2) designs and where they come from
    7. 3.7 Where LLMs, VLMs, and VLAs sit in the stack
    8. 3.8 Failure modes of each architecture
    part-1-foundations-of-embodied-ai/module-03-embodied-system-architectures/

Part II · Mathematical, Robotics, and Control Foundations

5 chapters · 36 sections

The geometry, kinematics, dynamics, control, and sensing that make physical agents intelligible.

  1. 4
    Spatial Representation and Coordinate Frames Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 4.1 Why space is the substrate of embodiment
    2. 4.2 Points, vectors, poses, frames
    3. 4.3 Rotations: matrices, Euler angles, axis-angle, quaternions; pitfalls
    4. 4.4 Rigid transforms, homogeneous coordinates, SE(3)
    5. 4.5 2D and 3D transformations; transform trees (tf in ROS)
    6. 4.6 Camera, body, and world frames
    7. 4.7 Common frame mistakes and how to debug them
    part-2-mathematical-robotics-and-control-foundations/module-04-spatial-representation-and-coordinate-frames/
  2. 5
    Kinematics and Robot Motion Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 5.1 Position, velocity, acceleration; twists
    2. 5.2 Holonomic vs. non-holonomic motion
    3. 5.3 Differential-drive and car-like robots
    4. 5.4 Robot arms, joints, the kinematic chain
    5. 5.5 Forward kinematics
    6. 5.6 Inverse kinematics: analytic, numerical (Jacobian), and learned
    7. 5.7 Jacobians, singularities, manipulability
    8. 5.8 Motion constraints
    part-2-mathematical-robotics-and-control-foundations/module-05-kinematics-and-robot-motion/
  3. 6
    Dynamics and Simulation Math Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 6.1 From kinematics to dynamics: forces, torques, inertia
    2. 6.2 Rigid-body dynamics; the manipulator equation
    3. 6.3 Contact, friction, and why contact-rich sim is hard
    4. 6.4 Numerical integration and stability
    5. 6.5 Differentiable physics: what it buys you
    6. 6.6 Why GPU-parallel simulation changed robot learning
    part-2-mathematical-robotics-and-control-foundations/module-06-dynamics-and-simulation-math/
  4. 7
    Control for AI Practitioners Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 7.1 Open-loop vs. closed-loop control
    2. 7.2 Feedback, error, stability, overshoot, oscillation
    3. 7.3 PID control, intuition and tuning
    4. 7.4 State-space control, LQR
    5. 7.5 Model predictive control (MPC) as receding-horizon optimization
    6. 7.6 Operational-space and whole-body control (preview for humanoids)
    7. 7.7 Controllers vs. policies; when learning helps and when it makes control unsafe
    part-2-mathematical-robotics-and-control-foundations/module-07-control-for-ai-practitioners/
  5. 8
    Sensors, Perception Hardware, and State Estimation Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 8.1 What sensors provide and what they cost
    2. 8.2 Cameras, depth (stereo/structured light/ToF), LiDAR
    3. 8.3 IMU, wheel odometry, joint encoders, proprioception
    4. 8.4 Tactile and force/torque sensing (GelSight, DIGIT) : preview
    5. 8.5 Sensor noise and uncertainty models
    6. 8.6 Bayesian filtering: Kalman, EKF, particle filters
    7. 8.7 Sensor fusion intuition and practice
    8. 8.8 Perception as an imperfect window into the world
    part-2-mathematical-robotics-and-control-foundations/module-08-sensors-perception-hardware-and-state-estimation/

Part III · Simulation, Tooling, and the Modern Stack

5 chapters · 32 sections

The simulators, environments, benchmarks, and synthetic-data practices used to build embodied systems today.

  1. 9
    Why Simulation Is Central Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 9.1 Why real-world learning is slow, costly, and risky
    2. 9.2 Simulation as data generator, testbed, and curriculum
    3. 9.3 Fidelity: physical, visual, behavioral
    4. 9.4 The reality gap as a measurable quantity
    5. 9.5 The landscape of benchmark environments
    part-3-simulation-tooling-and-the-modern-stack/module-09-why-simulation-is-central/
  2. 10
    Environments with Gymnasium (and PettingZoo) Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 10.1 Gym is dead; Gymnasium is the standard
    2. 10.2 Observation and action spaces
    3. 10.3 Reward design and termination
    4. 10.4 Vectorized environments; wrappers
    5. 10.5 Rendering, logging, and debugging
    6. 10.6 Evaluation protocol and seeding
    7. 10.7 PettingZoo for multi-agent
    part-3-simulation-tooling-and-the-modern-stack/module-10-environments-with-gymnasium-and-pettingzoo/
  3. 11
    Physics Simulators: MuJoCo, MJX, Isaac Lab, Genesis Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 11.1 What physics simulators model (bodies, joints, contacts, friction)
    2. 11.2 MuJoCo and the MJCF/URDF model formats
    3. 11.3 MuJoCo MJX and MuJoCo Warp: massively parallel and differentiable
    4. 11.4 NVIDIA Isaac Sim + Isaac Lab; the Isaac Gym -> Isaac Lab migration
    5. 11.5 The Newton physics engine and OpenUSD scene interchange
    6. 11.6 Genesis and generative multi-physics
    7. 11.7 Drake, SAPIEN, ROS 2 + Gazebo; where each fits
    8. 11.8 Choosing a simulator: a decision guide and recency table
    part-3-simulation-tooling-and-the-modern-stack/module-11-physics-simulators-mujoco-mjx-isaac-lab-genesis/
  4. 12
    Benchmarks and Task Suites Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 12.1 Why standardized benchmarks matter
    2. 12.2 Manipulation: ManiSkill3, robosuite, RoboCasa, robomimic, RLBench
    3. 12.3 Lifelong and language-conditioned: LIBERO, CALVIN, Meta-World
    4. 12.4 Household and long-horizon: BEHAVIOR-1K / OmniGibson
    5. 12.5 Navigation and social: Habitat 3.0, AI2-THOR / ProcTHOR
    6. 12.6 Reading a leaderboard without fooling yourself
    part-3-simulation-tooling-and-the-modern-stack/module-12-benchmarks-and-task-suites/
  5. 13
    Domain Randomization and Synthetic Data Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 13.1 Why synthetic variation matters
    2. 13.2 Visual, physics, sensor, and task randomization
    3. 13.3 Curriculum and automatic randomization
    4. 13.4 Photoreal rendering and tiled cameras
    5. 13.5 real2sim2real and asset/scene reconstruction
    6. 13.6 Randomization vs. realism; measuring transfer readiness
    part-3-simulation-tooling-and-the-modern-stack/module-13-domain-randomization-and-synthetic-data/

Part IV · Reinforcement Learning for Embodied Agents

7 chapters · 37 sections

Interaction-driven learning, from policy gradients and off-policy methods to safe exploration and sim-to-real transfer.

  1. 14
    Reinforcement Learning Refresher Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 14.1 Learning from interaction; return and discounting
    2. 14.2 Policies and value functions
    3. 14.3 Exploration vs. exploitation
    4. 14.4 Model-free vs. model-based; on- vs. off-policy
    5. 14.5 Why RL is hard in embodied systems (sample cost, reward, safety)
    part-4-reinforcement-learning-for-embodied-agents/module-14-reinforcement-learning-refresher/
  2. 15
    Policy Gradient Methods and PPO Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 15.1 Direct policy optimization; stochastic policies
    2. 15.2 The policy gradient theorem; REINFORCE
    3. 15.3 Actor-critic and advantage estimation (GAE)
    4. 15.4 Trust regions; TRPO to PPO
    5. 15.5 PPO in practice: the implementation details that matter
    6. 15.6 Reward shaping and its hazards
    part-4-reinforcement-learning-for-embodied-agents/module-15-policy-gradient-methods-and-ppo/
  3. 16
    Value-Based and Off-Policy Methods Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 16.1 Q-learning; deep Q-networks
    2. 16.2 Replay buffers and target networks
    3. 16.3 Continuous control: DDPG, TD3, SAC
    4. 16.4 Maximum-entropy RL
    5. 16.5 Sample efficiency and off-policy failure modes
    part-4-reinforcement-learning-for-embodied-agents/module-16-value-based-and-off-policy-methods/
  4. 17
    Massively Parallel and GPU RL Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 17.1 Why thousands of parallel envs changed the field
    2. 17.2 Learning to walk in minutes: the parallel-RL recipe
    3. 17.3 Isaac Lab with SKRL / rl_games / RSL-RL
    4. 17.4 MJX/Brax-training and JAX RL
    5. 17.5 Teacher-student and privileged-information distillation
    6. 17.6 Throughput, wall-clock, and cost engineering
    part-4-reinforcement-learning-for-embodied-agents/module-17-massively-parallel-and-gpu-rl/
  5. 18
    Reward Design and Goal Specification Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 18.1 Why rewards are dangerous
    2. 18.2 Sparse vs. dense; shaping done right
    3. 18.3 Goal-conditioned policies; hindsight experience replay
    4. 18.4 Reward hacking, with case studies
    5. 18.5 Human preferences and learned reward models (RLHF for control)
    6. 18.6 Safety-aware and constrained rewards
    part-4-reinforcement-learning-for-embodied-agents/module-18-reward-design-and-goal-specification/
  6. 19
    Exploration in Embodied Worlds Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 19.1 Why embodied exploration is expensive and risky
    2. 19.2 Intrinsic motivation, curiosity, count-based and novelty methods
    3. 19.3 Safe exploration
    4. 19.4 Exploration under partial observability
    part-4-reinforcement-learning-for-embodied-agents/module-19-exploration-in-embodied-worlds/
  7. 20
    Sim-to-Real Transfer (RL focus) Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 20.1 The reality gap revisited
    2. 20.2 What transfers and what does not
    3. 20.3 Domain randomization, system identification, adaptation (RMA)
    4. 20.4 Fine-tuning on hardware; safe real-world RL
    5. 20.5 Measuring transfer performance
    part-4-reinforcement-learning-for-embodied-agents/module-20-sim-to-real-transfer-rl-focus/

Part V · Learning from Demonstration and Robot Data

6 chapters · 33 sections

A coherent segment of the embodied ai stack.

  1. 21
    Imitation Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 21.1 Why learning from demonstration matters for robots
    2. 21.2 Behavior cloning; the distribution-shift problem
    3. 21.3 DAgger and dataset aggregation
    4. 21.4 Inverse reinforcement learning
    5. 21.5 Sources of demonstrations: humans, planners, foundation models
    part-5-learning-from-demonstration-and-robot-data/module-21-imitation-learning/
  2. 22
    Action Chunking and Diffusion Policies Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 22.1 Why single-step prediction fails on real manipulation
    2. 22.2 ACT (Action Chunking Transformer) and the cVAE formulation
    3. 22.3 ALOHA, ALOHA 2, and Mobile ALOHA
    4. 22.4 Diffusion Policy: action generation by denoising
    5. 22.5 Flow matching for actions
    6. 22.6 VQ-BeT and discretized behavior modeling
    7. 22.7 Choosing an action representation: a decision guide
    part-5-learning-from-demonstration-and-robot-data/module-22-action-chunking-and-diffusion-policies/
  3. 23
    Teleoperation and Data Collection Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 23.1 Why data is the bottleneck
    2. 23.2 Leader-follower teleoperation (ALOHA, GELLO)
    3. 23.3 Handheld and in-the-wild collection (UMI)
    4. 23.4 Immersive/VR teleoperation (Open-TeleVision)
    5. 23.5 Data quality, diversity, and labeling
    6. 23.6 The LeRobotDataset format and pipeline
    part-5-learning-from-demonstration-and-robot-data/module-23-teleoperation-and-data-collection/
  4. 24
    Robot Datasets and Data Scaling Laws Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 24.1 The major datasets: Open X-Embodiment, DROID, BridgeData V2, RH20T, RoboMIND, AgiBot World
    2. 24.2 Dataset structure, embodiment metadata, and licensing
    3. 24.3 Cross-embodiment pooling
    4. 24.4 Empirical data scaling laws in imitation learning
    5. 24.5 Curating and mixing data
    part-5-learning-from-demonstration-and-robot-data/module-24-robot-datasets-and-data-scaling-laws/
  5. 25
    Offline RL and Dataset-Based Robot Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 25.1 Learning without online interaction
    2. 25.2 Distribution shift and extrapolation error
    3. 25.3 Conservative methods (CQL, IQL) and their intuition
    4. 25.4 Offline-to-online fine-tuning
    5. 25.5 Evaluating offline policies rigorously
    part-5-learning-from-demonstration-and-robot-data/module-25-offline-rl-and-dataset-based-robot-learning/
  6. 26
    Skills, Hierarchy, and Task Decomposition Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 26.1 What a skill is; low- vs. high-level actions
    2. 26.2 The options framework
    3. 26.3 Skill discovery and hierarchical RL
    4. 26.4 Language as a high-level controller
    5. 26.5 Skill libraries for embodied agents
    part-5-learning-from-demonstration-and-robot-data/module-26-skills-hierarchy-and-task-decomposition/

Part VI · Embodied Perception

4 chapters · 27 sections

Vision, 3d understanding, localization, mapping, and navigation as perception for action.

  1. 27
    Visual Perception for Action Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 27.1 Seeing to classify vs. seeing to act
    2. 27.2 Detection, segmentation, and the Segment Anything family
    3. 27.3 Depth estimation and metric scale
    4. 27.4 Optical flow and motion cues
    5. 27.5 Affordances and graspable regions
    6. 27.6 Active and embodied perception
    7. 27.7 When perception failures become action failures
    part-6-embodied-perception/module-27-visual-perception-for-action/
  2. 28
    3D Perception and Neural Scene Representations Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 28.1 Why 3D matters for manipulation and navigation
    2. 28.2 Point clouds and depth maps
    3. 28.3 3D detection and scene reconstruction
    4. 28.4 Occupancy grids and voxel maps
    5. 28.5 NeRF: implicit radiance fields
    6. 28.6 3D Gaussian Splatting: explicit, editable, real-time
    7. 28.7 Scene representations for robotics: SLAM, real2sim, manipulation
    part-6-embodied-perception/module-28-3d-perception-and-neural-scene-representations/
  3. 29
    Localization and Mapping (SLAM) Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 29.1 Where am I and what does the world look like
    2. 29.2 Odometry and dead reckoning
    3. 29.3 Localization (Monte Carlo / particle filters)
    4. 29.4 Mapping and occupancy grids
    5. 29.5 SLAM: graph-based and visual SLAM
    6. 29.6 Neural and Gaussian-splat SLAM
    7. 29.7 Map uncertainty
    8. 29.8 Modern SLAM Systems And Failure Modes
    part-6-embodied-perception/module-29-localization-and-mapping-slam/
  4. 30
    Navigation and Path Planning Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 30.1 Navigation as embodied intelligence
    2. 30.2 Graph search: BFS, Dijkstra, A*
    3. 30.3 Sampling-based planning: RRT, RRT*, PRM
    4. 30.4 Local planning and obstacle avoidance (DWA, potential fields)
    5. 30.5 Learned navigation policies
    6. 30.6 Language- and image-goal navigation
    7. 30.7 Field Navigation Under Degraded Sensing
    part-6-embodied-perception/module-30-navigation-and-path-planning/

Part VII · Language, Vision, and Action

5 chapters · 35 sections

Language-guided agents, vlms, llm planners, vlas, and cross-embodiment foundation models.

  1. 31
    Language-Guided Embodied Agents Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 31.1 Why language matters in embodied AI
    2. 31.2 Instructions, goals, constraints
    3. 31.3 Grounding language in perception; referring expressions
    4. 31.4 Object- and region-centric grounding
    5. 31.5 Task planning from language; ambiguity and clarification
    6. 31.6 Human-agent interaction
    part-7-language-vision-and-action/module-31-language-guided-embodied-agents/
  2. 32
    Vision-Language Models for Embodiment Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 32.1 From image-text models to embodied perception
    2. 32.2 CLIP, SigLIP, DINOv2 representations
    3. 32.3 Vision-language encoders and open-vocabulary detection
    4. 32.4 Visual question answering and scene description in environments
    5. 32.5 Multimodal memory
    6. 32.6 Limits of static VLMs in dynamic worlds
    part-7-language-vision-and-action/module-32-vision-language-models-for-embodiment/
  3. 33
    LLMs as Planners and Controllers Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 33.1 What LLMs can and cannot do in embodied tasks
    2. 33.2 SayCan: affordance-grounded planning
    3. 33.3 Code as Policies: LLMs that write robot code
    4. 33.4 VoxPoser: composing 3D value maps
    5. 33.5 ReKep: relational keypoint constraints
    6. 33.6 Tool use, action APIs, plan verification, replanning
    7. 33.7 Memory, state tracking, and hallucination in physical tasks
    8. 33.8 Safe LLM-agent interfaces
    part-7-language-vision-and-action/module-33-llms-as-planners-and-controllers/
  4. 34
    Vision-Language-Action Models Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 34.1 From VLMs to VLAs: the core idea
    2. 34.2 The lineage: RT-1, RT-2, RT-X / Open X-Embodiment
    3. 34.3 Open generalist policies: Octo, OpenVLA
    4. 34.4 Diffusion/flow VLAs: RDT-1B, π0, π0-FAST, π0.5
    5. 34.5 Action tokenization vs. continuous heads; the FAST tokenizer
    6. 34.6 Co-training with web data for semantic generalization
    7. 34.7 Prompting and conditioning embodied policies
    8. 34.8 Evaluating VLA behavior; limitations and open problems
    9. 34.9 Action Representations In VLA Systems
    part-7-language-vision-and-action/module-34-vision-language-action-models/
  5. 35
    Robot Foundation Models and Cross-Embodiment Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 35.1 Why foundation models matter for robotics
    2. 35.2 Cross-embodiment training and transfer
    3. 35.3 Dual-system architectures: GR00T N1.5, Helix, Gemini Robotics (with Frontier-Watch caveats)
    4. 35.4 Large behavior models and rigorous evaluation
    5. 35.5 Adapting to new robots; prompting and conditioning
    6. 35.6 Data scale, compute, and the open-vs-closed divide
    7. 35.7 Limitations and open questions
    8. 35.8 Serving, Fine-Tuning, And Evaluating Open Robot Foundation Models
    part-7-language-vision-and-action/module-35-robot-foundation-models-and-cross-embodiment-learning/

Part VIII · World Models and Model-Based Embodied AI

6 chapters · 32 sections

Prediction, latent dynamics, model-based control, generative worlds, and diffusion planning.

  1. 36
    Predicting the Future Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 36.1 Why agents need to predict
    2. 36.2 Forward/dynamics models; state vs. observation prediction
    3. 36.3 Error accumulation and horizon
    4. 36.4 Uncertainty in prediction
    5. 36.5 Planning with predicted futures
    part-8-world-models-and-model-based-embodied-ai/module-36-predicting-the-future/
  2. 37
    Model-Based RL and MPC Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 37.1 Model-free vs. model-based trade-offs
    2. 37.2 Learning dynamics models; ensembles and uncertainty
    3. 37.3 Planning with learned models; MPC and CEM/MPPI
    4. 37.4 Imagination rollouts
    5. 37.5 Sample-efficiency advantages and failure modes
    part-8-world-models-and-model-based-embodied-ai/module-37-model-based-rl-and-mpc/
  3. 38
    Latent World Models Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 38.1 Why predict in latent space
    2. 38.2 Autoencoders and recurrent state-space models (RSSM)
    3. 38.3 Dreamer to DreamerV3
    4. 38.4 Transformer world models (IRIS)
    5. 38.5 TD-MPC2: latent MPC at scale
    6. 38.6 World models for visual control
    part-8-world-models-and-model-based-embodied-ai/module-38-latent-world-models/
  4. 39
    Generative and Video World Models Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 39.1 Generative models as learned simulators
    2. 39.2 Genie 1-3: interactive, playable world models
    3. 39.3 Video generation as world simulation: Sora and successors
    4. 39.4 NVIDIA Cosmos: world foundation models for physical AI
    5. 39.5 GameNGen and Oasis: neural game engines
    6. 39.6 Using generative world models for data and evaluation (e.g., humanoid pipelines)
    7. 39.7 Evaluating consistency, controllability, and horizon
    part-8-world-models-and-model-based-embodied-ai/module-39-generative-and-video-world-models/
  5. 40
    Predictive Representations and Self-Supervised World Models Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 40.1 Predict in representation space, not pixels: the JEPA idea
    2. 40.2 I-JEPA and V-JEPA
    3. 40.3 V-JEPA 2 and action-conditioned latent planning
    4. 40.4 Self-supervised pretraining for control
    part-8-world-models-and-model-based-embodied-ai/module-40-predictive-representations-and-self-supervised-world-models/
  6. 41
    Diffusion and Generative Planning Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 41.1 Diffusion models as planners
    2. 41.2 Diffuser and Decision Diffuser
    3. 41.3 Generative trajectory planning and scoring
    4. 41.4 Generating scenes and synthetic experience
    5. 41.5 Risks of generated experience
    part-8-world-models-and-model-based-embodied-ai/module-41-diffusion-and-generative-planning/

Part IX · Manipulation, Locomotion, and Embodied Skills

7 chapters · 45 sections

Hands, legs, humanoids, drones, vehicles, and the skills that let agents move through the world.

  1. 42
    Robotic Manipulation Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 42.1 What manipulation is; reaching and pushing
    2. 42.2 Pick-and-place pipelines
    3. 42.3 Contact-rich interaction
    4. 42.4 Perception for manipulation
    5. 42.5 Learning manipulation policies (IL, RL, VLA)
    6. 42.6 Failure detection and recovery
    7. 42.7 Mobile Manipulation: Base, Arm, Perception, And Recovery
    part-9-manipulation-locomotion-and-embodied-skills/module-42-robotic-manipulation/
  2. 43
    Grasping and Dexterous Manipulation Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 43.1 Grasp synthesis: analytic and learned (Dex-Net lineage)
    2. 43.2 Parallel-jaw vs. multi-finger hands
    3. 43.3 In-hand manipulation and reorientation
    4. 43.4 Dexterous RL with demonstrations
    5. 43.5 Sim-to-real for dexterity
    part-9-manipulation-locomotion-and-embodied-skills/module-43-grasping-and-dexterous-manipulation/
  3. 44
    Tactile and Visuo-Tactile Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 44.1 Why touch matters for contact-rich tasks
    2. 44.2 Vision-based tactile sensors (GelSight, DIGIT)
    3. 44.3 Simulating touch (e.g., tactile sim in Isaac)
    4. 44.4 Visuo-tactile pretraining and policies
    5. 44.5 Combining vision and touch
    part-9-manipulation-locomotion-and-embodied-skills/module-44-tactile-and-visuo-tactile-learning/
  4. 45
    Locomotion and Mobility Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 45.1 Wheeled, legged, and hybrid robots
    2. 45.2 Balance, stability, and gait
    3. 45.3 Learning locomotion with massively parallel RL
    4. 45.4 Terrain adaptation, parkour, and rapid motor adaptation
    5. 45.5 Energy efficiency; sim-to-real and safety in locomotion
    part-9-manipulation-locomotion-and-embodied-skills/module-45-locomotion-and-mobility/
  5. 46
    Humanoid Robots and Whole-Body Control Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 46.1 Why humanoids became the focus (data, morphology, hardware cost)
    2. 46.2 Platforms: Unitree G1/H1, Figure, Optimus, 1X, electric Atlas, Apptronik
    3. 46.3 Whole-body and operational-space control
    4. 46.4 Learning from humans: HumanPlus, OmniH2O/HOVER, motion retargeting
    5. 46.5 Teleoperation for humanoids
    6. 46.6 Dual-system humanoid foundation models (tie-back to Ch. 35)
    7. 46.7 Safety for human-scale robots
    8. 46.8 Advanced humanoid dynamics and contact mechanics
    9. 46.9 Boston Dynamics-style loco-manipulation research track
    part-9-manipulation-locomotion-and-embodied-skills/module-46-humanoid-robots-and-whole-body-control/
  6. 47
    Drones and Aerial Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 47.1 Why aerial agents are special
    2. 47.2 Flight dynamics intuition
    3. 47.3 Perception, navigation, and obstacle avoidance
    4. 47.4 Coverage and inspection; multi-drone coordination
    5. 47.5 Safety, regulation, and simulation for aerial agents
    6. 47.6 Quadrotor dynamics and flight control
    7. 47.7 Trajectory generation and GPS-denied missions
    8. 47.8 PX4 To Hardware: SITL, HITL, Logs, And Flight-Test Evidence
    part-9-manipulation-locomotion-and-embodied-skills/module-47-drones-and-aerial-embodied-ai/
  7. 48
    Autonomous Driving as Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 48.1 Driving as perception, prediction, planning, control
    2. 48.2 Sensors and sensor fusion in AVs
    3. 48.3 Detection, lane and behavior prediction
    4. 48.4 Route and local planning
    5. 48.5 End-to-end and world-model driving
    6. 48.6 Scenario testing and safety cases
    7. 48.7 Vehicle kinematics, dynamics, and control
    8. 48.8 Route, behavior, and scenario-based planning
    9. 48.9 Closed-Loop Driving Evaluation And Safety Assurance
    part-9-manipulation-locomotion-and-embodied-skills/module-48-autonomous-driving-as-embodied-ai/

Part X · Multi-Agent and Human-Centered Embodiment

3 chapters · 16 sections

Teams of agents, humans in the loop, open worlds, and lifelong interaction.

  1. 49
    Multi-Agent Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 49.1 One agent vs. many
    2. 49.2 Cooperation, competition, communication
    3. 49.3 Shared perception and task allocation
    4. 49.4 Multi-agent RL (with PettingZoo)
    5. 49.5 Swarms and emergent behavior; evaluating teams
    part-10-multi-agent-and-human-centered-embodiment/module-49-multi-agent-embodied-ai/
  2. 50
    Human-Robot Interaction Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 50.1 Robots among humans
    2. 50.2 Natural-language interaction and social navigation
    3. 50.3 Intent recognition and trust calibration
    4. 50.4 Explainable robot behavior
    5. 50.5 Human feedback and shared autonomy
    6. 50.6 Ethical concerns
    part-10-multi-agent-and-human-centered-embodiment/module-50-human-robot-interaction/
  3. 51
    Open-World and Novelty-Robust Embodiment Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 51.1 Closed- vs. open-world tasks
    2. 51.2 Novel objects and instructions; changing environments
    3. 51.3 Long-horizon tasks
    4. 51.4 Distribution shift triggers and open-world adaptation
    5. 51.5 Novelty detection and retraining triggers; open-world evaluation
    part-10-multi-agent-and-human-centered-embodiment/module-51-open-world-and-lifelong-embodiment/

Part XI · Evaluation, Safety, Robustness, and Deployment

4 chapters · 21 sections

Metrics, uncertainty, safety filters, deployment architecture, and operational discipline.

  1. 52
    Evaluating Embodied Systems Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 52.1 Why accuracy is not enough
    2. 52.2 Success rate, path efficiency, time and energy cost
    3. 52.3 Safety violations and constraint satisfaction
    4. 52.4 Robustness and generalization metrics
    5. 52.5 Reproducible evaluation: SIMPLER and sim-as-proxy
    6. 52.6 Real-world evaluation hygiene; benchmark design
    part-11-evaluation-safety-robustness-and-deployment/module-52-evaluating-embodied-systems/
  2. 53
    Robustness and Uncertainty Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 53.1 What goes wrong: sensor noise, distribution shift
    2. 53.2 Model uncertainty and calibration
    3. 53.3 Out-of-distribution detection
    4. 53.4 Runtime monitoring and fail-safe behavior
    part-11-evaluation-safety-robustness-and-deployment/module-53-robustness-and-uncertainty/
  3. 54
    Safety in Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 54.1 Why embodied safety is different (physical harm)
    2. 54.2 Constraint violations and safe exploration
    3. 54.3 Control barrier functions and Hamilton-Jacobi reachability
    4. 54.4 Shielded policies and safety filters
    5. 54.5 Human override and safety testing
    6. 54.6 Deployment approval and safety cases
    7. 54.7 Safety Cases And Assurance Arguments For Embodied AI
    part-11-evaluation-safety-robustness-and-deployment/module-54-safety-in-embodied-ai/
  4. 55
    Deployment Architecture Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 55.1 From notebook to robot
    2. 55.2 Real-time inference and control rates
    3. 55.3 Edge vs. cloud-robot computation; asynchronous inference
    4. 55.4 Logging, monitoring, model updates
    5. 55.5 Failure recovery, security, maintenance
    6. 55.6 Industrial Fleets, Open-RMF, AMR Interoperability, And Operations
    part-11-evaluation-safety-robustness-and-deployment/module-55-deployment-architecture/

Part XII · Frontiers, Capstones, and Course Design

5 chapters · 32 sections

Memory, continual learning, open problems, capstone projects, and teaching paths.

  1. 56
    Embodied Agents with Memory Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 56.1 Why memory matters; short- vs. long-term
    2. 56.2 Spatial, episodic, and semantic memory
    3. 56.3 Memory retrieval for planning
    4. 56.4 Memory errors
    part-12-frontiers-capstones-and-course-design/module-56-embodied-agents-with-memory/
  2. 57
    Continual and Lifelong Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 57.1 Learning after deployment
    2. 57.2 Catastrophic forgetting and mitigation
    3. 57.3 Online adaptation; human correction as data
    4. 57.4 Safe continual learning; evaluation over time
    part-12-frontiers-capstones-and-course-design/module-57-continual-and-lifelong-learning/
  3. 58
    Frontier and Open Problems Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 58.1 Scaling laws and data engines for robots
    2. 58.2 Generalist vs. specialist policies
    3. 58.3 World models in the robot loop
    4. 58.4 The open-vs-closed model divide
    5. 58.5 What is still unsolved (long-horizon reasoning, reliability, real-world RL)
    6. 58.99 Frontier Watch
    part-12-frontiers-capstones-and-course-design/module-58-frontier-and-open-problems/
  4. 59
    Capstone Projects Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 59.1 Object search in a simulated home
    2. 59.2 Language-guided navigation with replanning
    3. 59.3 Vision-based robotic pick-and-place (IL + RL)
    4. 59.4 Fine-tune an open VLA on a custom task (LeRobot)
    5. 59.5 Learned locomotion with sim-to-real analysis
    6. 59.6 World-model-based planning agent
    7. 59.7 Safety-shielded embodied agent
    8. 59.8 LLM-based household task planner
    9. 59.9 Drone inspection planner
    10. 59.10 Multi-agent search and rescue
    11. 59.11 Open-ended research project
    12. 59.12 Application Track Capstone Templates
    part-12-frontiers-capstones-and-course-design/module-59-capstone-projects/
  5. 60
    Teaching with This Book Theory, practical recipe, lab, and library shortcuts for this chapter.
    1. 60.1 One-semester graduate course (14 weeks)
    2. 60.2 One-semester advanced undergraduate course (lighter theory, more labs)
    3. 60.3 Two-semester sequence
    4. 60.4 Research-seminar track
    5. 60.5 Lab infrastructure and compute budgeting for instructors
    6. 60.6 Assessment, rubrics, and academic-integrity notes for code assignments
    part-12-frontiers-capstones-and-course-design/module-60-teaching-with-this-book/

Appendices · Reference and Pedagogy

9 appendices
  1. A
    Linear Algebra and 3D Geometry RefresherReference material supporting the self-contained book promise.
    appendices/appendix-a-linear-algebra-3d-geometry/
  2. B
    Probability, Estimation, and Optimization RefresherReference material supporting the self-contained book promise.
    appendices/appendix-b-probability-estimation-optimization/
  3. C
    The Embodied AI ToolboxReference material supporting the self-contained book promise.
    appendices/appendix-c-embodied-ai-toolbox/
  4. D
    PyTorch and JAX for Embodied AIReference material supporting the self-contained book promise.
    appendices/appendix-d-pytorch-jax/
  5. E
    Compute RecipesReference material supporting the self-contained book promise.
    appendices/appendix-e-compute-recipes/
  6. F
    Datasets and Benchmarks CatalogReference material supporting the self-contained book promise.
    appendices/appendix-f-datasets-benchmarks/
  7. G
    Reproducibility and Experiment HygieneReference material supporting the self-contained book promise.
    appendices/appendix-g-reproducibility/
  8. H
    Notation and GlossaryReference material supporting the self-contained book promise.
    appendices/appendix-h-notation-glossary/
  9. I
    Citing the FrontierReference material supporting the self-contained book promise.
    appendices/appendix-i-citing-frontier/