Second Edition · 2026

Building Embodied AIFrom Perception to Autonomous Action

A practitioner's guide to embodied agents, robot learning, simulation, world models, and physical intelligence.

Alexander (Sasha) Apartsin, Ph.D.&Yehudit Aperstein, Ph.D.

Embodied intelligence begins when an agent's computation is inseparable from its body. The sensors determine what the world looks like; the actuators determine what can be changed; the physics determine what the consequences will be. This book is one connected journey through that full stack: robotics and control, simulation, reinforcement and imitation learning, vision-language-action models, world models, humanoids, safety, and deployment.

12 parts60 chapters363 sections9 appendices

The Twelve-Part Arc

Each part stands on the one before it; together they build an agent that senses, predicts, decides, acts, and learns.

Foundations of Embodied AI

The conceptual vocabulary of agents, environments, embodiment, and closed-loop intelligence.

3 chapters · 24 sections II

Mathematical, Robotics, and Control Foundations

The geometry, kinematics, dynamics, control, and sensing that make physical agents intelligible.

5 chapters · 36 sections III

Simulation, Tooling, and the Modern Stack

The simulators, environments, benchmarks, and synthetic-data practices used to build embodied systems today.

5 chapters · 32 sections IV

Reinforcement Learning for Embodied Agents

Interaction-driven learning, from policy gradients and off-policy methods to safe exploration and sim-to-real transfer.

7 chapters · 37 sections V

Learning from Demonstration and Robot Data

Imitation learning, teleoperation, action chunking, diffusion policies, robot datasets, and the data-scaling laws that drive modern robot learning.

6 chapters · 33 sections VI

Embodied Perception

Vision, 3D understanding, localization, mapping, and navigation as perception for action.

4 chapters · 27 sections VII

Language, Vision, and Action

Language-guided agents, VLMs, LLM planners, VLAs, and cross-embodiment foundation models.

5 chapters · 35 sections VIII

World Models and Model-Based Embodied AI

Prediction, latent dynamics, model-based control, generative worlds, and diffusion planning.

6 chapters · 32 sections IX

Manipulation, Locomotion, and Embodied Skills

Hands, legs, humanoids, drones, vehicles, and the skills that let agents move through the world.

7 chapters · 39 sections X

Multi-Agent and Human-Centered Embodiment

Teams of agents, humans in the loop, open worlds, and lifelong interaction.

3 chapters · 16 sections XI

Evaluation, Safety, Robustness, and Deployment

Metrics, uncertainty, safety filters, deployment architecture, and operational discipline.

4 chapters · 21 sections XII

Frontiers, Capstones, and Course Design

Memory, continual learning, open problems, capstone projects, and teaching paths.

5 chapters · 31 sections

How This Book Teaches

Five habits, kept in every chapter from the first simulator to the final deployment.

Worked Systems

Every chapter connects concepts to a runnable artifact, from a tiny environment to a robot-learning pipeline.

Library Shortcuts

After each from-scratch build, a shortcut names the maintained tool that makes the practical version small.

A Callout System

Failure modes, research frontiers, recipes, and cross-references are typeset as distinct boxes for fast scanning.

Exercises And Labs

Chapters close with build tasks that turn theory into testable systems.

Classical Ideas Return Learned

Geometry, control, estimation, and simulation reappear inside modern learned policies and world models.

The Hands-On AI Science Series

Building Embodied AI is one of nine connected books, each a deep, build-it-yourself guide to a major field of AI.

Hands-On AI Science is a series of in-depth guides to the major fields of artificial intelligence. Every book goes deep into the theory, models, and internals, covering the classical foundations and the most recent ideas, then shows you how to build each one in Python with the modern libraries and tools that get the job done. The writing stays plain and light (illustrations, analogies, mental models, worked examples, and a little fun) without trading away rigor or coverage. Each volume is self-contained and complete enough to anchor a full course on its subject.