Part Overview
This part covers a coherent segment of the embodied AI stack. It connects formal ideas with the tools and labs needed to build working systems.
Chapters: 6. Each chapter includes theory, recipes, practical code, a library shortcut, and exercises.
Learning from Demonstration and Robot Data gives the reader a working layer of the embodied AI stack. Later chapters assume this layer when agents must perceive, plan, act, and recover from mistakes.
This chapter develops imitation learning as part of the embodied AI stack.
- 21.1 Why learning from demonstration matters for robots
- 21.2 Behavior cloning; the distribution-shift problem
- 21.3 DAgger and dataset aggregation
- 21.4 Inverse reinforcement learning
- 21.5 Sources of demonstrations: humans, planners, foundation models
This chapter develops action chunking and diffusion policies as part of the embodied AI stack.
- 22.1 Why single-step prediction fails on real manipulation
- 22.2 ACT (Action Chunking Transformer) and the cVAE formulation
- 22.3 ALOHA, ALOHA 2, and Mobile ALOHA
- 22.4 Diffusion Policy: action generation by denoising
- 22.5 Flow matching for actions
This chapter develops teleoperation and data collection as part of the embodied AI stack.
- 23.1 Why data is the bottleneck
- 23.2 Leader-follower teleoperation (ALOHA, GELLO)
- 23.3 Handheld and in-the-wild collection (UMI)
- 23.4 Immersive/VR teleoperation (Open-TeleVision)
- 23.5 Data quality, diversity, and labeling
This chapter develops robot datasets and data scaling laws as part of the embodied AI stack.
- 24.1 The major datasets: Open X-Embodiment, DROID, BridgeData V2, RH20T, RoboMIND, AgiBot World
- 24.2 Dataset structure, embodiment metadata, and licensing
- 24.3 Cross-embodiment pooling
- 24.4 Empirical data scaling laws in imitation learning
- 24.5 Curating and mixing data
This chapter develops offline rl and dataset-based robot learning as part of the embodied AI stack.
- 25.1 Learning without online interaction
- 25.2 Distribution shift and extrapolation error
- 25.3 Conservative methods (CQL, IQL) and their intuition
- 25.4 Offline-to-online fine-tuning
- 25.5 Evaluating offline policies rigorously
This chapter develops skills, hierarchy, and task decomposition as part of the embodied AI stack.
- 26.1 What a skill is; low- vs. high-level actions
- 26.2 The options framework
- 26.3 Skill discovery and hierarchical RL
- 26.4 Language as a high-level controller
- 26.5 Skill libraries for embodied agents
What's Next?
After this part, Part VI: Embodied Perception extends the stack.