Section 1.6: Examples: vacuum, drone, autonomous vehicle, manipulator, humanoid, game agent | Building Embodied AI: From Perception to Autonomous Action

"The loop is shared. The body sets the clock and the price of being wrong."
Section 1.6

Big Picture

Embodiment is a spectrum, not a category. Every system in this section runs the same closed loop introduced in Section 1.2: sense, update belief, act, inherit the consequence. What varies down the spectrum is not the loop but its parameters. The body fixes the sensing modality, the dimensionality and units of the action space, the control rate (how often the loop closes), the degree of partial observability, the episode horizon, and above all the cost of an irreversible mistake. A robot vacuum and a humanoid are the same controlled Markov process at two extremes of time constant and failure cost. Reading the six bodies along these axes is the fastest way to see which techniques in later parts transfer and which assumptions silently break.

Figure 1.6. The six examples share one closed loop. Embodiment fixes the sensing modality, the loop period $\Delta t$, and whether a consequence can be undone; those three parameters, not the loop structure, separate a vacuum from a humanoid.

An embodiment is a tuple, and the axes are its coordinates

Reuse the closed loop of Section 1.2 and attach to each body the parameters that the loop does not fix on its own. A useful summary is the tuple $E=(\mathcal{O},\mathcal{A},\Delta t,k,H,\kappa)$: the observation space $\mathcal{O}$ (sensing modality), the action space $\mathcal{A}$ and its units, the control period $\Delta t$ with rate $1/\Delta t$, the degree of partial observability $k$ (how far the true state is from what a single observation reveals), the episode horizon $H$, and the failure cost $\kappa$ (how expensive and how reversible the worst routine mistake is). The policy class can be identical across two bodies; what differs is this coordinate vector. Walk the six examples and read off each coordinate concretely.

Robot vacuum

Sensing is sparse and proprioceptive: bump sensors, cliff IR, wheel odometry, an optical or low-cost lidar for SLAM. The action space is two-dimensional (wheel velocities, or a discrete turn-and-go set), commanded at a navigation rate of roughly 10-20 Hz over a costmap, while the wheel-velocity servo underneath runs faster. Partial observability is moderate: the map is built online and furniture moves between runs. The horizon is long (a cleaning run is tens of minutes, $H$ in the thousands of decision steps) but failures are cheap and reversible: a missed patch is re-covered, a wedge under a couch ends with a stop and a help request. This is the gentle end of the spectrum and the natural setting for the mobile-navigation stack treated in Part IV.

Drone / UAV

Sensing fuses an IMU at hundreds of Hz with GPS, a barometer, optical flow, and one or more cameras; pose must be estimated, never directly observed, so partial observability is high. The action space is the four rotor thrusts (or roll-pitch-yaw-thrust setpoints). The control stack is layered: the attitude loop runs at 250-1000 Hz (PX4 typically 250-1000 Hz on the inner loop), position control at tens of Hz, and any learned or planning layer at roughly 10-50 Hz. The horizon is bounded by energy: a battery gives minutes of flight, so $H$ is a hard budget, not a soft one. Failure cost is high and partly irreversible: loss of attitude control or a depleted battery means a fall. Drones recur in the aerial-robotics and state-estimation material of Parts IV and II.

Autonomous road vehicle

Sensing is the richest here: camera, radar, often lidar, plus GNSS and IMU, fused into a tracked scene of other agents. Partial observability is severe and adversarial: occlusion, intent of other drivers, and a long tail of rare events. The action space is low-dimensional at the point of actuation (steering, throttle, brake) but the decision is structured. Rates are layered again: behavior and motion planning run at roughly 10 Hz, while the lateral and longitudinal control loops run faster underneath. The episode horizon is open-ended (a drive has no natural reset) and the failure cost is the highest on the spectrum: a collision is irreversible and externalized onto people who never opted in. The whole of the safety and verification discussion in Parts VI and VII is calibrated to this $\kappa$.

Fixed manipulator arm

A bolted-down arm has near-perfect proprioception (joint encoders) but partial observability of the world it touches: object pose, mass, and friction are estimated from vision and force sensing. The action space is the joint vector (6-7 DOF) commanded as torque, velocity, or position. The torque loop is the fastest in the section: roughly 1 kHz is canonical (the Franka FCI real-time cycle is 1 ms / 1 kHz), with task-space planning and grasp selection layered above at 10-100 Hz. Horizons are short and episodic (a pick is seconds). Failure cost is intermediate and largely reversible by design: a dropped object is re-grasped, though a hard collision can damage hardware or workpiece. This is the home turf of the manipulation and grasping chapters in Part V.

Legged / humanoid robot

Sensing combines a high-rate IMU, joint encoders, foot-contact sensing, and increasingly exteroceptive vision for terrain. Partial observability is high: ground contact, slip, and terrain ahead must be inferred. The action space is a large joint vector (a humanoid has 20-40+ actuated joints) and the learned locomotion policy typically emits joint position targets at roughly 50 Hz (as in Lee et al. 2020 for quadrupedal locomotion), which a PD controller turns into torques at about 1 kHz underneath. The horizon spans the walk; the failure cost is high and fast: a balance loss becomes a fall within a few control periods, which is why the loop must close quickly. Legged locomotion and whole-body control are central to Part V.

Simulated game agent

The "body" is a rule world. Sensing is whatever the game exposes: a symbolic state, a pixel frame, or both; observability ranges from full (board games) to severe (partial-information or hidden-map games). The action space is a discrete legal-move set or a low-dimensional continuous control. The control rate is whatever the simulator steps at, often hundreds to thousands of frames per second with no physical clock to respect. The horizon varies by game, but the defining coordinate is failure cost: it is near zero, because the episode resets for free. That single fact, $\kappa \approx 0$ with unlimited cheap rollouts, is why simulated agents are the natural setting for the reinforcement-learning and self-play methods in Part III, and why results there do not transfer for free to any of the five physical bodies above.

Same loop, different time constants and failure costs

Across all six systems the loop structure is invariant: observe, update belief, act, inherit the next observation. Two coordinates do almost all the separating work. The control period $\Delta t$ spans four orders of magnitude, from a manipulator torque loop at 1 kHz to a vacuum planner at around 10 Hz to a game stepping with no physical clock at all. The failure cost $\kappa$ spans from "the episode resets for free" in simulation to "a person is harmed and nothing resets" for a road vehicle. A method is portable across two bodies only to the extent that it respects both of these, not just the shared loop diagram.

The six systems on one grid

The comparison table fixes the columns so the differences are about embodiment rather than reporting style. Rates given are for the loop the practitioner usually programs against; faster inner servo loops are noted where they dominate the design.

Table 1.6.1. Six embodiments along the six axes

System	Sensing modality	Action space	Control rate	Partial observability	Horizon	Failure cost
Robot vacuum	bump, cliff IR, odometry, low-cost lidar/SLAM	2-D wheel velocity	~10-20 Hz planner	moderate (map built online)	long (10³+ steps / run)	low, reversible (re-cover, stop)
Drone / UAV	IMU, GPS, baro, optical flow, camera	4 rotor thrusts / attitude setpoint	250-1000 Hz attitude; ~10-50 Hz planning	high (pose estimated)	energy-bounded (minutes)	high, partly irreversible (fall)
Autonomous vehicle	camera, radar, lidar, GNSS/IMU fusion	steer, throttle, brake	~10 Hz planning; faster control loop	severe, adversarial (occlusion, intent)	open-ended (no reset)	highest, irreversible, externalized
Manipulator arm	joint encoders, vision, force/torque	6-7 DOF joint torque/pose	~1 kHz torque; 10-100 Hz planning	moderate (object pose/friction)	short, episodic (seconds)	intermediate, mostly reversible
Legged / humanoid	IMU, encoders, contact, exteroceptive vision	20-40+ DOF joint targets	~50 Hz policy; ~1 kHz PD torque	high (contact, slip, terrain)	walk-length, falls end it	high, fast (fall in a few steps)
Game agent	symbolic state and/or pixels	discrete legal moves / low-D control	simulator step (no physical clock)	full to severe (game-dependent)	game-dependent	near zero (free reset)

A technique does not transfer across the rate gap for free

The most expensive cross-domain mistake is to port a method without re-checking its time budget. A model-predictive grasp planner that re-optimizes over 80 ms is excellent for a manipulator whose planning layer runs at 10-100 Hz and whose worst routine failure is a dropped object. The same 80 ms inference, dropped into an autonomous vehicle whose planner must close at roughly 10 Hz against agents moving at highway speed, may already be a budget violation, and the failure it guards against is irreversible rather than re-tryable. "It worked on the arm" says nothing about the vehicle until you have re-derived $\Delta t$ and $\kappa$ for the new body. Always re-establish the loop period and the failure cost before reusing a controller across the spectrum.

Library shortcut: match the environment contract to the body

No single simulator spans the spectrum well. Gymnasium gives the clean episode contract for game agents (Part III); PettingZoo extends it to multi-agent games; MuJoCo and Isaac Lab provide the contact and high-rate dynamics that legged and manipulation work need (Part V); a CARLA-style stack supplies the traffic scene and sensor suite for vehicles (Parts IV and VI); Nav2 on ROS 2 grounds the vacuum and mobile-robot navigation case (Part IV). Choose the stack whose failure surface and control rate match the body, rather than forcing every example through one API.

Research frontier: generalist policies across the whole spectrum

The open question is whether one policy can span bodies that differ by four orders of magnitude in control rate and from $\kappa\approx 0$ to irreversible-and-externalized in failure cost. Open X-Embodiment pooled data from many robots and showed that an RT-X-style policy trained across embodiments can outperform per-robot training on several arms, evidence that some task representation transfers across action spaces. The frontier is pushing this past the arm: into legged and humanoid bodies with fast balance dynamics, into vehicles where the safety case forbids exploratory failure, and toward vision-language-action models (Part VIII) that must respect each body's time budget rather than only its semantics. Transfer of intent is encouraging; transfer that respects $\Delta t$ and $\kappa$ is the unsolved part.

Key Takeaway

The six examples are one closed loop evaluated at six points of a parameter vector. Sensing modality, action space, control rate, partial observability, horizon, and failure cost are the coordinates; the control period and the irreversibility of failure do most of the separating. Read any new system by placing it on these axes first, and reuse a method across bodies only after re-deriving the two coordinates it is most sensitive to: $\Delta t$ and $\kappa$.

Exercise 1.6.1

Place two systems that are not in this section, a teleoperated surgical robot and a warehouse autonomous mobile robot (AMR), on all six axes (sensing modality, action space, control rate, partial observability, horizon, failure cost). For each, state the control rate of the loop you would program against and the worst routine failure with its reversibility. Then name one technique from a body in Table 1.6.1 that transfers to your system and one that does not, justifying each by the coordinate it depends on.

Exercise 1.6.2

Extend Table 1.6.1 with a column for failure cost on an ordinal scale (free reset, reversible, intermediate, irreversible-local, irreversible-externalized) and sort the systems by the product of control period and failure-cost rank. Which body ranks as the hardest to test safely, and does that match where the book spends its safety and verification effort (Parts VI and VII)?

What's Next?

Section 1.7 explains why these examples are hard: partial observability, long horizons, safety, and data cost.

Section References

Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., and Hutter, M. "Learning Quadrupedal Locomotion over Challenging Terrain." Science Robotics 5(47):eabc5986 (2020). https://www.science.org/doi/10.1126/scirobotics.abc5986

Source for the legged-locomotion coordinates: a learned policy emitting joint targets at about 50 Hz over a PD torque loop near 1 kHz, on ANYmal across rough terrain.

Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J. A., and Goldberg, K. "Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics." RSS (2017). https://arxiv.org/abs/1703.09312

A canonical manipulation reference for the grasp-selection layer that sits above the manipulator's 1 kHz torque loop.

Chen, L. et al. "End-to-End Autonomous Driving: Challenges and Frontiers." IEEE TPAMI (2024). https://arxiv.org/abs/2306.16927

A recent survey grounding the autonomous-vehicle coordinates: layered planning at roughly 10 Hz, severe partial observability, and an irreversible failure cost.

Open X-Embodiment Collaboration. "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." (2023). https://arxiv.org/abs/2310.08864

The cross-embodiment evidence behind the research frontier: a single policy trained across many robots transferring task representation across action spaces.

Sutton, R. S., and Barto, A. G. "Reinforcement Learning: An Introduction." (2018). http://incompleteideas.net/book/the-book-2nd.html

The durable reference for the controlled Markov process, episode horizon, and trajectory-level objectives that the embodiment tuple specializes per body.