Foreword | Building Embodied AI: From Perception to Autonomous Action

Embodied intelligence begins when an agent's computation is inseparable from its body. The sensors determine what the world looks like; the actuators determine what can be changed; the physics determine what the consequences will be. These three constraints are not engineering inconveniences. They are the subject.

A classifier learns which output is most likely given an input. An embodied agent learns which action is most worth taking given a state that its previous actions helped create, inside a world it cannot fully observe and cannot undo. The difference is not one of scale. It is a difference in the mathematical object being learned, the feedback that drives learning, and the failure modes that actually matter.

This book is one connected journey through the full stack: the geometry and dynamics of bodies in physical space; the simulators that make learning tractable before hardware; the reinforcement and imitation learning algorithms that close the loop; the vision, language, and action models that let agents interpret instructions and generate behavior; the world models that let agents plan before acting; the skills and whole-body control that let humanoids, drones, and manipulators move with purpose; and the safety, evaluation, and deployment practices that turn working code into running systems.

The loop runs from perception to action, from simple simulated agents to modern robot foundation models, and from classical feedback control to cross-embodiment policies trained on millions of robot trajectories. Every part of that arc is here.