Front Matter · Opening Material
9 entries- F1ForewordFront matter for Building Embodied AI: From Perception to Autonomous Action.
front-matter/foreword.html - F2About the AuthorsFront matter for Building Embodied AI: From Perception to Autonomous Action.
front-matter/about-authors.html - F3About the Hands-On AI Science SeriesThe series promise and why Embodied AI is the fifth volume.
front-matter/about-the-series.html - F4Who Should Read This BookFront matter for Building Embodied AI: From Perception to Autonomous Action.
front-matter/fm-who-should-read.html - F5How to Use This BookFront matter for Building Embodied AI: From Perception to Autonomous Action.
front-matter/fm-how-to-use.html - F6What This Book CoversFront matter for Building Embodied AI: From Perception to Autonomous Action.
front-matter/fm-what-this-book-covers.html - F7Look Inside PreviewFront matter for Building Embodied AI: From Perception to Autonomous Action.
front-matter/look-inside-preview.html - F8Application Reader PathwaysApplication-specific pathways through the book.
front-matter/application-reader-pathways.html - F9Copyright and LegalFront matter for Building Embodied AI: From Perception to Autonomous Action.
front-matter/copyright.html
Part I · Foundations of Embodied AI
3 chapters · 24 sectionsThe conceptual vocabulary of agents, environments, embodiment, and closed-loop intelligence.
-
1From Static AI to Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
- 1.1 Static prediction vs. embodied interaction
- 1.2 Why intelligence needs a world; the perception-action loop
- 1.3 Agents, environments, observations, actions, rewards, constraints
- 1.4 Physical vs. simulated embodiment
- 1.5 The "Physical AI" framing and why 2023-2026 changed the field
- 1.6 Examples: vacuum, drone, autonomous vehicle, manipulator, humanoid, game agent
- 1.7 Why embodied AI is hard (partial observability, long horizons, safety, data cost)
- 1.8 Map of the book
part-1-foundations-of-embodied-ai/module-01-from-static-ai-to-embodied-ai/ -
2The Agent-Environment Interface Theory, practical recipe, lab, and library shortcuts for this chapter.
- 2.1 Agents and environments formally
- 2.2 State, observation, hidden variables, partial observability
- 2.3 Action types: discrete, continuous, symbolic, motor-level, chunked
- 2.4 Rewards, goals, costs, constraints
- 2.5 Episodes, horizons, trajectories, discounting
- 2.6 Markov decision processes; Bellman equations
- 2.7 Partially observable MDPs; belief states
- 2.8 Why embodiment is usually partially observable
part-1-foundations-of-embodied-ai/module-02-the-agent-environment-interface/ -
3Embodied System Architectures Theory, practical recipe, lab, and library shortcuts for this chapter.
- 3.1 The canonical stack: sense, perceive, estimate, predict, plan, control, act
- 3.2 Classical modular robotics pipeline
- 3.3 End-to-end learned policy pipeline
- 3.4 Hybrid and hierarchical architectures
- 3.5 Reactive vs. deliberative agents
- 3.6 Dual-system (System 1 / System 2) designs and where they come from
- 3.7 Where LLMs, VLMs, and VLAs sit in the stack
- 3.8 Failure modes of each architecture
part-1-foundations-of-embodied-ai/module-03-embodied-system-architectures/
Part II · Mathematical, Robotics, and Control Foundations
5 chapters · 36 sectionsThe geometry, kinematics, dynamics, control, and sensing that make physical agents intelligible.
-
4Spatial Representation and Coordinate Frames Theory, practical recipe, lab, and library shortcuts for this chapter.
- 4.1 Why space is the substrate of embodiment
- 4.2 Points, vectors, poses, frames
- 4.3 Rotations: matrices, Euler angles, axis-angle, quaternions; pitfalls
- 4.4 Rigid transforms, homogeneous coordinates, SE(3)
- 4.5 2D and 3D transformations; transform trees (tf in ROS)
- 4.6 Camera, body, and world frames
- 4.7 Common frame mistakes and how to debug them
part-2-mathematical-robotics-and-control-foundations/module-04-spatial-representation-and-coordinate-frames/ -
5Kinematics and Robot Motion Theory, practical recipe, lab, and library shortcuts for this chapter.
- 5.1 Position, velocity, acceleration; twists
- 5.2 Holonomic vs. non-holonomic motion
- 5.3 Differential-drive and car-like robots
- 5.4 Robot arms, joints, the kinematic chain
- 5.5 Forward kinematics
- 5.6 Inverse kinematics: analytic, numerical (Jacobian), and learned
- 5.7 Jacobians, singularities, manipulability
- 5.8 Motion constraints
part-2-mathematical-robotics-and-control-foundations/module-05-kinematics-and-robot-motion/ -
6Dynamics and Simulation Math Theory, practical recipe, lab, and library shortcuts for this chapter.
- 6.1 From kinematics to dynamics: forces, torques, inertia
- 6.2 Rigid-body dynamics; the manipulator equation
- 6.3 Contact, friction, and why contact-rich sim is hard
- 6.4 Numerical integration and stability
- 6.5 Differentiable physics: what it buys you
- 6.6 Why GPU-parallel simulation changed robot learning
part-2-mathematical-robotics-and-control-foundations/module-06-dynamics-and-simulation-math/ -
7Control for AI Practitioners Theory, practical recipe, lab, and library shortcuts for this chapter.
- 7.1 Open-loop vs. closed-loop control
- 7.2 Feedback, error, stability, overshoot, oscillation
- 7.3 PID control, intuition and tuning
- 7.4 State-space control, LQR
- 7.5 Model predictive control (MPC) as receding-horizon optimization
- 7.6 Operational-space and whole-body control (preview for humanoids)
- 7.7 Controllers vs. policies; when learning helps and when it makes control unsafe
part-2-mathematical-robotics-and-control-foundations/module-07-control-for-ai-practitioners/ -
8Sensors, Perception Hardware, and State Estimation Theory, practical recipe, lab, and library shortcuts for this chapter.
- 8.1 What sensors provide and what they cost
- 8.2 Cameras, depth (stereo/structured light/ToF), LiDAR
- 8.3 IMU, wheel odometry, joint encoders, proprioception
- 8.4 Tactile and force/torque sensing (GelSight, DIGIT) : preview
- 8.5 Sensor noise and uncertainty models
- 8.6 Bayesian filtering: Kalman, EKF, particle filters
- 8.7 Sensor fusion intuition and practice
- 8.8 Perception as an imperfect window into the world
part-2-mathematical-robotics-and-control-foundations/module-08-sensors-perception-hardware-and-state-estimation/
Part III · Simulation, Tooling, and the Modern Stack
5 chapters · 32 sectionsThe simulators, environments, benchmarks, and synthetic-data practices used to build embodied systems today.
-
9Why Simulation Is Central Theory, practical recipe, lab, and library shortcuts for this chapter.
- 9.1 Why real-world learning is slow, costly, and risky
- 9.2 Simulation as data generator, testbed, and curriculum
- 9.3 Fidelity: physical, visual, behavioral
- 9.4 The reality gap as a measurable quantity
- 9.5 The landscape of benchmark environments
part-3-simulation-tooling-and-the-modern-stack/module-09-why-simulation-is-central/ -
10Environments with Gymnasium (and PettingZoo) Theory, practical recipe, lab, and library shortcuts for this chapter.
- 10.1 Gym is dead; Gymnasium is the standard
- 10.2 Observation and action spaces
- 10.3 Reward design and termination
- 10.4 Vectorized environments; wrappers
- 10.5 Rendering, logging, and debugging
- 10.6 Evaluation protocol and seeding
- 10.7 PettingZoo for multi-agent
part-3-simulation-tooling-and-the-modern-stack/module-10-environments-with-gymnasium-and-pettingzoo/ -
11Physics Simulators: MuJoCo, MJX, Isaac Lab, Genesis Theory, practical recipe, lab, and library shortcuts for this chapter.
- 11.1 What physics simulators model (bodies, joints, contacts, friction)
- 11.2 MuJoCo and the MJCF/URDF model formats
- 11.3 MuJoCo MJX and MuJoCo Warp: massively parallel and differentiable
- 11.4 NVIDIA Isaac Sim + Isaac Lab; the Isaac Gym -> Isaac Lab migration
- 11.5 The Newton physics engine and OpenUSD scene interchange
- 11.6 Genesis and generative multi-physics
- 11.7 Drake, SAPIEN, ROS 2 + Gazebo; where each fits
- 11.8 Choosing a simulator: a decision guide and recency table
part-3-simulation-tooling-and-the-modern-stack/module-11-physics-simulators-mujoco-mjx-isaac-lab-genesis/ -
12Benchmarks and Task Suites Theory, practical recipe, lab, and library shortcuts for this chapter.
- 12.1 Why standardized benchmarks matter
- 12.2 Manipulation: ManiSkill3, robosuite, RoboCasa, robomimic, RLBench
- 12.3 Lifelong and language-conditioned: LIBERO, CALVIN, Meta-World
- 12.4 Household and long-horizon: BEHAVIOR-1K / OmniGibson
- 12.5 Navigation and social: Habitat 3.0, AI2-THOR / ProcTHOR
- 12.6 Reading a leaderboard without fooling yourself
part-3-simulation-tooling-and-the-modern-stack/module-12-benchmarks-and-task-suites/ -
13Domain Randomization and Synthetic Data Theory, practical recipe, lab, and library shortcuts for this chapter.
- 13.1 Why synthetic variation matters
- 13.2 Visual, physics, sensor, and task randomization
- 13.3 Curriculum and automatic randomization
- 13.4 Photoreal rendering and tiled cameras
- 13.5 real2sim2real and asset/scene reconstruction
- 13.6 Randomization vs. realism; measuring transfer readiness
part-3-simulation-tooling-and-the-modern-stack/module-13-domain-randomization-and-synthetic-data/
Part IV · Reinforcement Learning for Embodied Agents
7 chapters · 37 sectionsInteraction-driven learning, from policy gradients and off-policy methods to safe exploration and sim-to-real transfer.
-
14Reinforcement Learning Refresher Theory, practical recipe, lab, and library shortcuts for this chapter.
- 14.1 Learning from interaction; return and discounting
- 14.2 Policies and value functions
- 14.3 Exploration vs. exploitation
- 14.4 Model-free vs. model-based; on- vs. off-policy
- 14.5 Why RL is hard in embodied systems (sample cost, reward, safety)
part-4-reinforcement-learning-for-embodied-agents/module-14-reinforcement-learning-refresher/ -
15Policy Gradient Methods and PPO Theory, practical recipe, lab, and library shortcuts for this chapter.
- 15.1 Direct policy optimization; stochastic policies
- 15.2 The policy gradient theorem; REINFORCE
- 15.3 Actor-critic and advantage estimation (GAE)
- 15.4 Trust regions; TRPO to PPO
- 15.5 PPO in practice: the implementation details that matter
- 15.6 Reward shaping and its hazards
part-4-reinforcement-learning-for-embodied-agents/module-15-policy-gradient-methods-and-ppo/ -
16Value-Based and Off-Policy Methods Theory, practical recipe, lab, and library shortcuts for this chapter.
- 16.1 Q-learning; deep Q-networks
- 16.2 Replay buffers and target networks
- 16.3 Continuous control: DDPG, TD3, SAC
- 16.4 Maximum-entropy RL
- 16.5 Sample efficiency and off-policy failure modes
part-4-reinforcement-learning-for-embodied-agents/module-16-value-based-and-off-policy-methods/ -
17Massively Parallel and GPU RL Theory, practical recipe, lab, and library shortcuts for this chapter.
- 17.1 Why thousands of parallel envs changed the field
- 17.2 Learning to walk in minutes: the parallel-RL recipe
- 17.3 Isaac Lab with SKRL / rl_games / RSL-RL
- 17.4 MJX/Brax-training and JAX RL
- 17.5 Teacher-student and privileged-information distillation
- 17.6 Throughput, wall-clock, and cost engineering
part-4-reinforcement-learning-for-embodied-agents/module-17-massively-parallel-and-gpu-rl/ -
18Reward Design and Goal Specification Theory, practical recipe, lab, and library shortcuts for this chapter.
- 18.1 Why rewards are dangerous
- 18.2 Sparse vs. dense; shaping done right
- 18.3 Goal-conditioned policies; hindsight experience replay
- 18.4 Reward hacking, with case studies
- 18.5 Human preferences and learned reward models (RLHF for control)
- 18.6 Safety-aware and constrained rewards
part-4-reinforcement-learning-for-embodied-agents/module-18-reward-design-and-goal-specification/ -
19Exploration in Embodied Worlds Theory, practical recipe, lab, and library shortcuts for this chapter.
- 19.1 Why embodied exploration is expensive and risky
- 19.2 Intrinsic motivation, curiosity, count-based and novelty methods
- 19.3 Safe exploration
- 19.4 Exploration under partial observability
part-4-reinforcement-learning-for-embodied-agents/module-19-exploration-in-embodied-worlds/ -
20Sim-to-Real Transfer (RL focus) Theory, practical recipe, lab, and library shortcuts for this chapter.
- 20.1 The reality gap revisited
- 20.2 What transfers and what does not
- 20.3 Domain randomization, system identification, adaptation (RMA)
- 20.4 Fine-tuning on hardware; safe real-world RL
- 20.5 Measuring transfer performance
part-4-reinforcement-learning-for-embodied-agents/module-20-sim-to-real-transfer-rl-focus/
Part V · Learning from Demonstration and Robot Data
6 chapters · 33 sectionsA coherent segment of the embodied ai stack.
-
21Imitation Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
- 21.1 Why learning from demonstration matters for robots
- 21.2 Behavior cloning; the distribution-shift problem
- 21.3 DAgger and dataset aggregation
- 21.4 Inverse reinforcement learning
- 21.5 Sources of demonstrations: humans, planners, foundation models
part-5-learning-from-demonstration-and-robot-data/module-21-imitation-learning/ -
22Action Chunking and Diffusion Policies Theory, practical recipe, lab, and library shortcuts for this chapter.
- 22.1 Why single-step prediction fails on real manipulation
- 22.2 ACT (Action Chunking Transformer) and the cVAE formulation
- 22.3 ALOHA, ALOHA 2, and Mobile ALOHA
- 22.4 Diffusion Policy: action generation by denoising
- 22.5 Flow matching for actions
- 22.6 VQ-BeT and discretized behavior modeling
- 22.7 Choosing an action representation: a decision guide
part-5-learning-from-demonstration-and-robot-data/module-22-action-chunking-and-diffusion-policies/ -
23Teleoperation and Data Collection Theory, practical recipe, lab, and library shortcuts for this chapter.
- 23.1 Why data is the bottleneck
- 23.2 Leader-follower teleoperation (ALOHA, GELLO)
- 23.3 Handheld and in-the-wild collection (UMI)
- 23.4 Immersive/VR teleoperation (Open-TeleVision)
- 23.5 Data quality, diversity, and labeling
- 23.6 The LeRobotDataset format and pipeline
part-5-learning-from-demonstration-and-robot-data/module-23-teleoperation-and-data-collection/ -
24Robot Datasets and Data Scaling Laws Theory, practical recipe, lab, and library shortcuts for this chapter.
- 24.1 The major datasets: Open X-Embodiment, DROID, BridgeData V2, RH20T, RoboMIND, AgiBot World
- 24.2 Dataset structure, embodiment metadata, and licensing
- 24.3 Cross-embodiment pooling
- 24.4 Empirical data scaling laws in imitation learning
- 24.5 Curating and mixing data
part-5-learning-from-demonstration-and-robot-data/module-24-robot-datasets-and-data-scaling-laws/ -
25Offline RL and Dataset-Based Robot Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
- 25.1 Learning without online interaction
- 25.2 Distribution shift and extrapolation error
- 25.3 Conservative methods (CQL, IQL) and their intuition
- 25.4 Offline-to-online fine-tuning
- 25.5 Evaluating offline policies rigorously
part-5-learning-from-demonstration-and-robot-data/module-25-offline-rl-and-dataset-based-robot-learning/ -
26Skills, Hierarchy, and Task Decomposition Theory, practical recipe, lab, and library shortcuts for this chapter.
- 26.1 What a skill is; low- vs. high-level actions
- 26.2 The options framework
- 26.3 Skill discovery and hierarchical RL
- 26.4 Language as a high-level controller
- 26.5 Skill libraries for embodied agents
part-5-learning-from-demonstration-and-robot-data/module-26-skills-hierarchy-and-task-decomposition/
Part VI · Embodied Perception
4 chapters · 27 sectionsVision, 3d understanding, localization, mapping, and navigation as perception for action.
-
27Visual Perception for Action Theory, practical recipe, lab, and library shortcuts for this chapter.
- 27.1 Seeing to classify vs. seeing to act
- 27.2 Detection, segmentation, and the Segment Anything family
- 27.3 Depth estimation and metric scale
- 27.4 Optical flow and motion cues
- 27.5 Affordances and graspable regions
- 27.6 Active and embodied perception
- 27.7 When perception failures become action failures
part-6-embodied-perception/module-27-visual-perception-for-action/ -
283D Perception and Neural Scene Representations Theory, practical recipe, lab, and library shortcuts for this chapter.
- 28.1 Why 3D matters for manipulation and navigation
- 28.2 Point clouds and depth maps
- 28.3 3D detection and scene reconstruction
- 28.4 Occupancy grids and voxel maps
- 28.5 NeRF: implicit radiance fields
- 28.6 3D Gaussian Splatting: explicit, editable, real-time
- 28.7 Scene representations for robotics: SLAM, real2sim, manipulation
part-6-embodied-perception/module-28-3d-perception-and-neural-scene-representations/ -
29Localization and Mapping (SLAM) Theory, practical recipe, lab, and library shortcuts for this chapter.
- 29.1 Where am I and what does the world look like
- 29.2 Odometry and dead reckoning
- 29.3 Localization (Monte Carlo / particle filters)
- 29.4 Mapping and occupancy grids
- 29.5 SLAM: graph-based and visual SLAM
- 29.6 Neural and Gaussian-splat SLAM
- 29.7 Map uncertainty
- 29.8 Modern SLAM Systems And Failure Modes
part-6-embodied-perception/module-29-localization-and-mapping-slam/ -
30Navigation and Path Planning Theory, practical recipe, lab, and library shortcuts for this chapter.
- 30.1 Navigation as embodied intelligence
- 30.2 Graph search: BFS, Dijkstra, A*
- 30.3 Sampling-based planning: RRT, RRT*, PRM
- 30.4 Local planning and obstacle avoidance (DWA, potential fields)
- 30.5 Learned navigation policies
- 30.6 Language- and image-goal navigation
- 30.7 Field Navigation Under Degraded Sensing
part-6-embodied-perception/module-30-navigation-and-path-planning/
Part VII · Language, Vision, and Action
5 chapters · 35 sectionsLanguage-guided agents, vlms, llm planners, vlas, and cross-embodiment foundation models.
-
31Language-Guided Embodied Agents Theory, practical recipe, lab, and library shortcuts for this chapter.
- 31.1 Why language matters in embodied AI
- 31.2 Instructions, goals, constraints
- 31.3 Grounding language in perception; referring expressions
- 31.4 Object- and region-centric grounding
- 31.5 Task planning from language; ambiguity and clarification
- 31.6 Human-agent interaction
part-7-language-vision-and-action/module-31-language-guided-embodied-agents/ -
32Vision-Language Models for Embodiment Theory, practical recipe, lab, and library shortcuts for this chapter.
- 32.1 From image-text models to embodied perception
- 32.2 CLIP, SigLIP, DINOv2 representations
- 32.3 Vision-language encoders and open-vocabulary detection
- 32.4 Visual question answering and scene description in environments
- 32.5 Multimodal memory
- 32.6 Limits of static VLMs in dynamic worlds
part-7-language-vision-and-action/module-32-vision-language-models-for-embodiment/ -
33LLMs as Planners and Controllers Theory, practical recipe, lab, and library shortcuts for this chapter.
- 33.1 What LLMs can and cannot do in embodied tasks
- 33.2 SayCan: affordance-grounded planning
- 33.3 Code as Policies: LLMs that write robot code
- 33.4 VoxPoser: composing 3D value maps
- 33.5 ReKep: relational keypoint constraints
- 33.6 Tool use, action APIs, plan verification, replanning
- 33.7 Memory, state tracking, and hallucination in physical tasks
- 33.8 Safe LLM-agent interfaces
part-7-language-vision-and-action/module-33-llms-as-planners-and-controllers/ -
34Vision-Language-Action Models Theory, practical recipe, lab, and library shortcuts for this chapter.
- 34.1 From VLMs to VLAs: the core idea
- 34.2 The lineage: RT-1, RT-2, RT-X / Open X-Embodiment
- 34.3 Open generalist policies: Octo, OpenVLA
- 34.4 Diffusion/flow VLAs: RDT-1B, π0, π0-FAST, π0.5
- 34.5 Action tokenization vs. continuous heads; the FAST tokenizer
- 34.6 Co-training with web data for semantic generalization
- 34.7 Prompting and conditioning embodied policies
- 34.8 Evaluating VLA behavior; limitations and open problems
- 34.9 Action Representations In VLA Systems
part-7-language-vision-and-action/module-34-vision-language-action-models/ -
35Robot Foundation Models and Cross-Embodiment Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
- 35.1 Why foundation models matter for robotics
- 35.2 Cross-embodiment training and transfer
- 35.3 Dual-system architectures: GR00T N1.5, Helix, Gemini Robotics (with Frontier-Watch caveats)
- 35.4 Large behavior models and rigorous evaluation
- 35.5 Adapting to new robots; prompting and conditioning
- 35.6 Data scale, compute, and the open-vs-closed divide
- 35.7 Limitations and open questions
- 35.8 Serving, Fine-Tuning, And Evaluating Open Robot Foundation Models
part-7-language-vision-and-action/module-35-robot-foundation-models-and-cross-embodiment-learning/
Part VIII · World Models and Model-Based Embodied AI
6 chapters · 32 sectionsPrediction, latent dynamics, model-based control, generative worlds, and diffusion planning.
-
36Predicting the Future Theory, practical recipe, lab, and library shortcuts for this chapter.
- 36.1 Why agents need to predict
- 36.2 Forward/dynamics models; state vs. observation prediction
- 36.3 Error accumulation and horizon
- 36.4 Uncertainty in prediction
- 36.5 Planning with predicted futures
part-8-world-models-and-model-based-embodied-ai/module-36-predicting-the-future/ -
37Model-Based RL and MPC Theory, practical recipe, lab, and library shortcuts for this chapter.
- 37.1 Model-free vs. model-based trade-offs
- 37.2 Learning dynamics models; ensembles and uncertainty
- 37.3 Planning with learned models; MPC and CEM/MPPI
- 37.4 Imagination rollouts
- 37.5 Sample-efficiency advantages and failure modes
part-8-world-models-and-model-based-embodied-ai/module-37-model-based-rl-and-mpc/ -
38Latent World Models Theory, practical recipe, lab, and library shortcuts for this chapter.
- 38.1 Why predict in latent space
- 38.2 Autoencoders and recurrent state-space models (RSSM)
- 38.3 Dreamer to DreamerV3
- 38.4 Transformer world models (IRIS)
- 38.5 TD-MPC2: latent MPC at scale
- 38.6 World models for visual control
part-8-world-models-and-model-based-embodied-ai/module-38-latent-world-models/ -
39Generative and Video World Models Theory, practical recipe, lab, and library shortcuts for this chapter.
- 39.1 Generative models as learned simulators
- 39.2 Genie 1-3: interactive, playable world models
- 39.3 Video generation as world simulation: Sora and successors
- 39.4 NVIDIA Cosmos: world foundation models for physical AI
- 39.5 GameNGen and Oasis: neural game engines
- 39.6 Using generative world models for data and evaluation (e.g., humanoid pipelines)
- 39.7 Evaluating consistency, controllability, and horizon
part-8-world-models-and-model-based-embodied-ai/module-39-generative-and-video-world-models/ -
40Predictive Representations and Self-Supervised World Models Theory, practical recipe, lab, and library shortcuts for this chapter.
- 40.1 Predict in representation space, not pixels: the JEPA idea
- 40.2 I-JEPA and V-JEPA
- 40.3 V-JEPA 2 and action-conditioned latent planning
- 40.4 Self-supervised pretraining for control
part-8-world-models-and-model-based-embodied-ai/module-40-predictive-representations-and-self-supervised-world-models/ -
41Diffusion and Generative Planning Theory, practical recipe, lab, and library shortcuts for this chapter.
- 41.1 Diffusion models as planners
- 41.2 Diffuser and Decision Diffuser
- 41.3 Generative trajectory planning and scoring
- 41.4 Generating scenes and synthetic experience
- 41.5 Risks of generated experience
part-8-world-models-and-model-based-embodied-ai/module-41-diffusion-and-generative-planning/
Part IX · Manipulation, Locomotion, and Embodied Skills
7 chapters · 45 sectionsHands, legs, humanoids, drones, vehicles, and the skills that let agents move through the world.
-
42Robotic Manipulation Theory, practical recipe, lab, and library shortcuts for this chapter.
- 42.1 What manipulation is; reaching and pushing
- 42.2 Pick-and-place pipelines
- 42.3 Contact-rich interaction
- 42.4 Perception for manipulation
- 42.5 Learning manipulation policies (IL, RL, VLA)
- 42.6 Failure detection and recovery
- 42.7 Mobile Manipulation: Base, Arm, Perception, And Recovery
part-9-manipulation-locomotion-and-embodied-skills/module-42-robotic-manipulation/ -
43Grasping and Dexterous Manipulation Theory, practical recipe, lab, and library shortcuts for this chapter.
- 43.1 Grasp synthesis: analytic and learned (Dex-Net lineage)
- 43.2 Parallel-jaw vs. multi-finger hands
- 43.3 In-hand manipulation and reorientation
- 43.4 Dexterous RL with demonstrations
- 43.5 Sim-to-real for dexterity
part-9-manipulation-locomotion-and-embodied-skills/module-43-grasping-and-dexterous-manipulation/ -
44Tactile and Visuo-Tactile Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
- 44.1 Why touch matters for contact-rich tasks
- 44.2 Vision-based tactile sensors (GelSight, DIGIT)
- 44.3 Simulating touch (e.g., tactile sim in Isaac)
- 44.4 Visuo-tactile pretraining and policies
- 44.5 Combining vision and touch
part-9-manipulation-locomotion-and-embodied-skills/module-44-tactile-and-visuo-tactile-learning/ -
45Locomotion and Mobility Theory, practical recipe, lab, and library shortcuts for this chapter.
- 45.1 Wheeled, legged, and hybrid robots
- 45.2 Balance, stability, and gait
- 45.3 Learning locomotion with massively parallel RL
- 45.4 Terrain adaptation, parkour, and rapid motor adaptation
- 45.5 Energy efficiency; sim-to-real and safety in locomotion
part-9-manipulation-locomotion-and-embodied-skills/module-45-locomotion-and-mobility/ -
46Humanoid Robots and Whole-Body Control Theory, practical recipe, lab, and library shortcuts for this chapter.
- 46.1 Why humanoids became the focus (data, morphology, hardware cost)
- 46.2 Platforms: Unitree G1/H1, Figure, Optimus, 1X, electric Atlas, Apptronik
- 46.3 Whole-body and operational-space control
- 46.4 Learning from humans: HumanPlus, OmniH2O/HOVER, motion retargeting
- 46.5 Teleoperation for humanoids
- 46.6 Dual-system humanoid foundation models (tie-back to Ch. 35)
- 46.7 Safety for human-scale robots
- 46.8 Advanced humanoid dynamics and contact mechanics
- 46.9 Boston Dynamics-style loco-manipulation research track
part-9-manipulation-locomotion-and-embodied-skills/module-46-humanoid-robots-and-whole-body-control/ -
47Drones and Aerial Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
- 47.1 Why aerial agents are special
- 47.2 Flight dynamics intuition
- 47.3 Perception, navigation, and obstacle avoidance
- 47.4 Coverage and inspection; multi-drone coordination
- 47.5 Safety, regulation, and simulation for aerial agents
- 47.6 Quadrotor dynamics and flight control
- 47.7 Trajectory generation and GPS-denied missions
- 47.8 PX4 To Hardware: SITL, HITL, Logs, And Flight-Test Evidence
part-9-manipulation-locomotion-and-embodied-skills/module-47-drones-and-aerial-embodied-ai/ -
48Autonomous Driving as Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
- 48.1 Driving as perception, prediction, planning, control
- 48.2 Sensors and sensor fusion in AVs
- 48.3 Detection, lane and behavior prediction
- 48.4 Route and local planning
- 48.5 End-to-end and world-model driving
- 48.6 Scenario testing and safety cases
- 48.7 Vehicle kinematics, dynamics, and control
- 48.8 Route, behavior, and scenario-based planning
- 48.9 Closed-Loop Driving Evaluation And Safety Assurance
part-9-manipulation-locomotion-and-embodied-skills/module-48-autonomous-driving-as-embodied-ai/
Part X · Multi-Agent and Human-Centered Embodiment
3 chapters · 16 sectionsTeams of agents, humans in the loop, open worlds, and lifelong interaction.
-
49Multi-Agent Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
- 49.1 One agent vs. many
- 49.2 Cooperation, competition, communication
- 49.3 Shared perception and task allocation
- 49.4 Multi-agent RL (with PettingZoo)
- 49.5 Swarms and emergent behavior; evaluating teams
part-10-multi-agent-and-human-centered-embodiment/module-49-multi-agent-embodied-ai/ -
50Human-Robot Interaction Theory, practical recipe, lab, and library shortcuts for this chapter.
- 50.1 Robots among humans
- 50.2 Natural-language interaction and social navigation
- 50.3 Intent recognition and trust calibration
- 50.4 Explainable robot behavior
- 50.5 Human feedback and shared autonomy
- 50.6 Ethical concerns
part-10-multi-agent-and-human-centered-embodiment/module-50-human-robot-interaction/ -
51Open-World and Novelty-Robust Embodiment Theory, practical recipe, lab, and library shortcuts for this chapter.
- 51.1 Closed- vs. open-world tasks
- 51.2 Novel objects and instructions; changing environments
- 51.3 Long-horizon tasks
- 51.4 Distribution shift triggers and open-world adaptation
- 51.5 Novelty detection and retraining triggers; open-world evaluation
part-10-multi-agent-and-human-centered-embodiment/module-51-open-world-and-lifelong-embodiment/
Part XI · Evaluation, Safety, Robustness, and Deployment
4 chapters · 21 sectionsMetrics, uncertainty, safety filters, deployment architecture, and operational discipline.
-
52Evaluating Embodied Systems Theory, practical recipe, lab, and library shortcuts for this chapter.
- 52.1 Why accuracy is not enough
- 52.2 Success rate, path efficiency, time and energy cost
- 52.3 Safety violations and constraint satisfaction
- 52.4 Robustness and generalization metrics
- 52.5 Reproducible evaluation: SIMPLER and sim-as-proxy
- 52.6 Real-world evaluation hygiene; benchmark design
part-11-evaluation-safety-robustness-and-deployment/module-52-evaluating-embodied-systems/ -
53Robustness and Uncertainty Theory, practical recipe, lab, and library shortcuts for this chapter.
- 53.1 What goes wrong: sensor noise, distribution shift
- 53.2 Model uncertainty and calibration
- 53.3 Out-of-distribution detection
- 53.4 Runtime monitoring and fail-safe behavior
part-11-evaluation-safety-robustness-and-deployment/module-53-robustness-and-uncertainty/ -
54Safety in Embodied AI Theory, practical recipe, lab, and library shortcuts for this chapter.
- 54.1 Why embodied safety is different (physical harm)
- 54.2 Constraint violations and safe exploration
- 54.3 Control barrier functions and Hamilton-Jacobi reachability
- 54.4 Shielded policies and safety filters
- 54.5 Human override and safety testing
- 54.6 Deployment approval and safety cases
- 54.7 Safety Cases And Assurance Arguments For Embodied AI
part-11-evaluation-safety-robustness-and-deployment/module-54-safety-in-embodied-ai/ -
55Deployment Architecture Theory, practical recipe, lab, and library shortcuts for this chapter.
- 55.1 From notebook to robot
- 55.2 Real-time inference and control rates
- 55.3 Edge vs. cloud-robot computation; asynchronous inference
- 55.4 Logging, monitoring, model updates
- 55.5 Failure recovery, security, maintenance
- 55.6 Industrial Fleets, Open-RMF, AMR Interoperability, And Operations
part-11-evaluation-safety-robustness-and-deployment/module-55-deployment-architecture/
Part XII · Frontiers, Capstones, and Course Design
5 chapters · 32 sectionsMemory, continual learning, open problems, capstone projects, and teaching paths.
-
56Embodied Agents with Memory Theory, practical recipe, lab, and library shortcuts for this chapter.
- 56.1 Why memory matters; short- vs. long-term
- 56.2 Spatial, episodic, and semantic memory
- 56.3 Memory retrieval for planning
- 56.4 Memory errors
part-12-frontiers-capstones-and-course-design/module-56-embodied-agents-with-memory/ -
57Continual and Lifelong Learning Theory, practical recipe, lab, and library shortcuts for this chapter.
- 57.1 Learning after deployment
- 57.2 Catastrophic forgetting and mitigation
- 57.3 Online adaptation; human correction as data
- 57.4 Safe continual learning; evaluation over time
part-12-frontiers-capstones-and-course-design/module-57-continual-and-lifelong-learning/ -
58Frontier and Open Problems Theory, practical recipe, lab, and library shortcuts for this chapter.
- 58.1 Scaling laws and data engines for robots
- 58.2 Generalist vs. specialist policies
- 58.3 World models in the robot loop
- 58.4 The open-vs-closed model divide
- 58.5 What is still unsolved (long-horizon reasoning, reliability, real-world RL)
- 58.99 Frontier Watch
part-12-frontiers-capstones-and-course-design/module-58-frontier-and-open-problems/ -
59Capstone Projects Theory, practical recipe, lab, and library shortcuts for this chapter.
- 59.1 Object search in a simulated home
- 59.2 Language-guided navigation with replanning
- 59.3 Vision-based robotic pick-and-place (IL + RL)
- 59.4 Fine-tune an open VLA on a custom task (LeRobot)
- 59.5 Learned locomotion with sim-to-real analysis
- 59.6 World-model-based planning agent
- 59.7 Safety-shielded embodied agent
- 59.8 LLM-based household task planner
- 59.9 Drone inspection planner
- 59.10 Multi-agent search and rescue
- 59.11 Open-ended research project
- 59.12 Application Track Capstone Templates
part-12-frontiers-capstones-and-course-design/module-59-capstone-projects/ -
60Teaching with This Book Theory, practical recipe, lab, and library shortcuts for this chapter.
- 60.1 One-semester graduate course (14 weeks)
- 60.2 One-semester advanced undergraduate course (lighter theory, more labs)
- 60.3 Two-semester sequence
- 60.4 Research-seminar track
- 60.5 Lab infrastructure and compute budgeting for instructors
- 60.6 Assessment, rubrics, and academic-integrity notes for code assignments
part-12-frontiers-capstones-and-course-design/module-60-teaching-with-this-book/
Appendices · Reference and Pedagogy
9 appendices- ALinear Algebra and 3D Geometry RefresherReference material supporting the self-contained book promise.
appendices/appendix-a-linear-algebra-3d-geometry/ - BProbability, Estimation, and Optimization RefresherReference material supporting the self-contained book promise.
appendices/appendix-b-probability-estimation-optimization/ - CThe Embodied AI ToolboxReference material supporting the self-contained book promise.
appendices/appendix-c-embodied-ai-toolbox/ - DPyTorch and JAX for Embodied AIReference material supporting the self-contained book promise.
appendices/appendix-d-pytorch-jax/ - ECompute RecipesReference material supporting the self-contained book promise.
appendices/appendix-e-compute-recipes/ - FDatasets and Benchmarks CatalogReference material supporting the self-contained book promise.
appendices/appendix-f-datasets-benchmarks/ - GReproducibility and Experiment HygieneReference material supporting the self-contained book promise.
appendices/appendix-g-reproducibility/ - HNotation and GlossaryReference material supporting the self-contained book promise.
appendices/appendix-h-notation-glossary/ - ICiting the FrontierReference material supporting the self-contained book promise.
appendices/appendix-i-citing-frontier/