What This Book Covers

Big Picture

This book covers embodied AI as a field, not a single technique. Embodied intelligence begins when an agent's computation is inseparable from its body: the sensors determine what the world looks like, the actuators determine what can be changed, and the physics determine what the consequences will be. That inseparability is the subject. The world in which it plays out may be a factory cell, a public road, a cluttered kitchen, an underwater pipeline, or a GPU-resident simulator. The scope is wide on purpose, because the field is wide; the spine that holds it together is the closed perception-action loop, treated as the invariant while morphology, sensing, and time constants vary.

The Scope of the Field

"Embodied" is not a synonym for "humanoid robot," and embodied AI is not a narrow contrast between prediction and interaction. The field spans the full embodiment spectrum: fixed manipulators on an assembly line; wheeled and tracked mobile bases; autonomous road vehicles; aerial and underwater vehicles; legged robots, including quadrupeds and bipedal humanoids; soft and continuum robots; wearables, prosthetics, and exoskeletons that share a body with a person; micro-robots and swarms; and purely simulated agents that may never touch hardware. These bodies differ in actuation, sensing, time constant, and the cost of a mistake, yet every one of them closes the same sense, decide, act, observe loop.

What makes the field coherent is a disciplinary confluence. Embodied AI inherits feedback and stability from control theory; geometry, kinematics, and actuation from robotics and mechatronics; learning from interaction from reinforcement learning; scene understanding from computer vision; instruction following and planning from language models; and the idea that a body shapes cognition from embodied cognition in cognitive science. The lineage runs through Wiener's cybernetics, Brooks' behavior-based critique of pure sense-plan-act, Moravec's paradox (sensorimotor skill is harder to automate than abstract reasoning), and morphological computation (the body itself carries part of the control). No single parent field owns the closed loop; embodied AI is the seam between them.

The Invariant

Across every chapter, the one thing that does not change is the closed loop: the agent's own actions generate its future observations, the world enforces physics and time, mistakes change the state rather than ending an example, and competence is a property of behavior over a horizon rather than accuracy on a fixed test set. The book organizes the spectrum around that invariant and treats morphology as a parameter.

What the Twelve Parts Cover

The book is organized into twelve parts.

Nine appendices carry the prerequisite refreshers (linear algebra and 3D geometry; probability, estimation, and optimization), an embodied AI toolbox, PyTorch and JAX usage, compute recipes, a datasets and benchmarks catalog, reproducibility hygiene, notation and glossary, and guidance on citing the frontier.

What This Book Does Not Cover

This is not a first course in programming, machine learning, or deep learning, and it does not re-teach those prerequisites inline; the appendices refresh the specific math the chapters lean on, and nothing more. It is not a mechanical engineering, electronics, or hardware-design text: actuator design, PCB layout, and mechanism fabrication are out of scope, and hardware is treated only where it constrains the learning and control problem. It is not a manual for one robot platform or one vendor SDK, and it is not a general AI survey; topics with no path to the perception-action loop (pure language modeling, recommendation, tabular ML) are left to other books in the series.

Current as of 2026

The book is written to the post-2023 state of the field. It covers vision-language-action models and robot foundation models, world models (including generative and video world models) used for planning and prediction, GPU-parallel simulation that trains policies in massively parallel environments, cross-embodiment data and transfer (the Open X-Embodiment line of work), and the maintained open-source stack practitioners actually use, including the LeRobot ecosystem for data, policies, and teleoperation. Version caveats and deprecated tools are marked where they matter, so the currency survives past the print date.