"An agent becomes interesting at the exact moment the world refuses to be a dataset."
A Patient Embodied AI Agent
LLMs as Planners and Controllers matters because embodied intelligence is a closed loop. The agent must turn partial observations into useful state, choose actions under uncertainty, and learn from the consequences in a physical or simulated world.
The core move is to connect LLMs as planners and controllers to action. A static model can be accurate and still be useless if it cannot support timely, safe, and recoverable behavior.
Chapter Overview
Chapter 33 develops LLMs as Planners and Controllers as a working piece of the embodied AI stack. The chapter starts with the role this topic plays in the sense, represent, predict, decide, act, observe, and learn loop, then turns that role into a concrete implementation pattern.
The practical thread focuses on typed tool interfaces, program synthesis, spatial value maps, verifier loops, and safety shields. The reader should leave with both a mental model of why these architectures work and a concrete build path for planner traces that survive real execution.
Prerequisites
Readers should be comfortable with Python, tensors, and the perception-action loop. When the chapter uses geometry, control, or probability, the relevant appendices provide a compact refresher.
Chapter Roadmap
- 33.1 What LLMs can and cannot do in embodied tasksDraw a firm boundary between semantic planning strengths and grounded-control responsibilities.
- 33.2 SayCan: affordance-grounded planningCombine language priors with executability estimates so plausible plans do not outrun the robot.
- 33.3 Code as Policies: LLMs that write robot codeUse constrained program synthesis and verification rather than free-text action scripts.
- 33.4 VoxPoser: composing 3D value mapsTranslate language into planner-facing spatial objectives instead of directly into motion.
- 33.5 ReKep: relational keypoint constraintsExpress manipulation goals as compact geometric relations that optimizers can solve.
- 33.6 Tool use, action APIs, plan verification, replanningDesign explicit loops for typed tool calls, postcondition checks, and plan revision.
- 33.7 Memory, state tracking, and hallucination in physical tasksTreat memory as a grounded state-estimation problem with freshness and provenance.
- 33.8 Safe LLM-agent interfacesInterpose shields, permissions, and escalation logic between symbolic plans and hardware.
This chapter uses the right-tool principle. Build one verified planning loop from scratch, then reach for maintained tools such as ROS 2 actions, BehaviorTree.CPP, MoveIt 2, LangGraph, and structured tool-calling runtimes when the task moves from pedagogy to deployment.
Hands-On Lab: Build a Verified LLM Planner Loop
Objective
Build a small embodied planning loop where an LLM proposes typed actions, a verifier checks postconditions, and the system replans or escalates when execution evidence disagrees with the proposal.
Steps
- Define a minimal typed action API for a tabletop or navigation domain.
- Generate candidate actions from a prompt or local mock planner.
- Run a verifier after each action and store the planner trace.
- Add one repair loop for failed tool calls and one safety escalation rule.
- Swap your hand-built planner shell for LangGraph or a behavior tree and compare what complexity disappeared.
42-Agent Production Checklist Applied
This chapter has been checked against the production team dimensions: chapter scope, curriculum alignment, deep explanation, teaching flow, student questions, cognitive load, examples, exercises, code pedagogy, visual learning, misconceptions, fact integrity, terminology, cross-references, narrative continuity, style, engagement, senior editorial quality, research frontier, structure, content currency, self-containment, opening hook, project work, aha moments, visual identity, demos, memorability, skeptical-reader challenge, prose clarity, pacing, illustrations, epigraph, application examples, fun notes, bibliography, meta-review, controller checks, publication QA, figure fact checking, code captions, and lab design.
For LLMs as Planners and Controllers, the practical gate is simple: every claim that reaches the chapter body must help a reader build or evaluate an embodied system, and every comparison must be backed by one construct-matched artifact.
Figure 33.1 gives this page a compact map of the interface. Read it left to right, then check whether the surrounding prose names the same observation, action, and evidence contract.
What's Next?
Continue with Section 33.1: What LLMs can and cannot do in embodied tasks, where the chapter moves from motivation to the first concrete idea.
This chapter is written for readers who want theory and a working build path in the same pass. Read each section twice: first for the mechanism, then for the artifact you would save if you had to reproduce the result six months later.
| Tool or Library | Where It Pays Off | |
|---|---|---|
| LLM tool calling | Constrain planner outputs to typed actions rather than free text. | Use it when the experiment needs a maintained interface, reproducible artifacts, or a standard dataset contract. |
| ROS 2 action servers | Expose skills with progress, cancelation, feedback, and failure states. | Use it when the experiment needs a maintained interface, reproducible artifacts, or a standard dataset contract. |
| MoveIt and motion planners | Execute geometric subgoals through tested planning interfaces. | Use it when the experiment needs a maintained interface, reproducible artifacts, or a standard dataset contract. |
| BehaviorTree.CPP | Represent fallback, retry, and verification logic explicitly. | Use it when the experiment needs a maintained interface, reproducible artifacts, or a standard dataset contract. |
| LangGraph | Prototype planner state machines while keeping logs and tool boundaries explicit. | Use it when the experiment needs a maintained interface, reproducible artifacts, or a standard dataset contract. |
Extend the lab by adding one baseline, one maintained-library implementation, and one perturbation test. Save the result as a single folder containing configuration, logs, summary metrics, and two representative failure cases.
The chapter can be used as a self-contained reading unit or as the basis for an undergraduate or graduate teaching week. The recommended pattern is concept, minimal implementation, library shortcut, diagnostic exercise, then reflection on failure modes. This keeps the mathematical idea attached to a concrete system artifact rather than letting it float as notation.
For LLMs as Planners and Controllers, the practical stack should be introduced as a set of choices rather than a shopping list. The relevant tools include Gymnasium, PettingZoo, ROS 2, MuJoCo, LeRobot. Each tool earns its place only when it shortens a working path, improves reproducibility, or exposes a standard interface that students will meet in real embodied systems.
Before leaving the chapter, the reader should be able to state one theory claim, one implementation claim, one evaluation claim, and one realistic failure mode. If any of those four are missing, the chapter should be revisited through the lab.
A strong chapter session ends with an artifact: a small script, a plotted trace, a simulator run, a data card, or a reproducible evaluation panel. The artifact is what turns reading into embodied-system-building practice.
Before leaving this chapter, choose one section and name its hook, core mechanism, runnable artifact, figure, misconception warning, exercise, bibliography trail, and evaluation caveat. This quick audit mirrors the 42-agent checklist used for Part VII.
Bibliography & Further Reading
Foundational Papers, Tools, and References
Sutton, R. S., and Barto, A. G.. "Reinforcement Learning: An Introduction." (2018). http://incompleteideas.net/book/the-book-2nd.html
A foundation for value functions, policy gradients, exploration, and the RL framing used throughout the book.
Todorov, E., Erez, T., and Tassa, Y.. "MuJoCo: A physics engine for model-based control." (2012). https://mujoco.org/
The simulator lineage behind much modern robot learning, now extended through MJX and Warp workflows.
Brohan, A. et al.. "RT-1: Robotics Transformer for real-world control at scale." (2022). https://arxiv.org/abs/2212.06817
A landmark in large-scale robot policy learning with transformer policies.
Brohan, A. et al.. "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." (2023). https://arxiv.org/abs/2307.15818
A central reference for connecting web-scale VLM knowledge to robot actions.
Open X-Embodiment Collaboration. "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." (2023). https://arxiv.org/abs/2310.08864
The cross-embodiment data and transfer reference used by the data chapters.
Chi, C. et al.. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." (2023). https://arxiv.org/abs/2303.04137
The practical diffusion policy reference for imitation learning and continuous action generation.
Hafner, D. et al.. "Mastering Diverse Domains through World Models." (2023). https://arxiv.org/abs/2301.04104
DreamerV3, a modern reference for latent world models and imagination-based control.
Hugging Face. "LeRobot." (2024). https://github.com/huggingface/lerobot
The open robot-learning stack used for datasets, policies, demos, and low-cost embodied AI workflows.
Official documentation and source repositories for LLMs as Planners and Controllers.
Use official docs to check install commands, current APIs, and version caveats before applying LLMs as Planners and Controllers in a lab or project.