Chapter 59: Capstone Projects | Building Embodied AI: From Perception to Autonomous Action

"I am not a final project until my failure cases have filenames."
A Capstone Artifact With Receipts

Big Picture

Capstone Projects closes the book by turning advanced embodied AI ideas into artifacts: memory traces, continual-learning panels, frontier claim audits, capstone deliverables, and teaching plans. The chapter converts the book into portfolio-grade projects. Each project defines a task contract, a baseline, a maintained-tool implementation, an evaluation panel, and a postmortem.

Chapter Through-Line

A capstone succeeds when it produces a reproducible embodied system artifact, a failure analysis, and a defensible metric. Read the chapter by asking the same four questions on every page: what changes in the loop, what evidence is saved, what can fail, and which tool makes the practical path shorter.

Chapter Overview

Chapter 59 turns the book into a portfolio of buildable embodied-AI projects. Each section is a ready-made capstone brief with a task contract, a baseline path, a maintained-tool path, an evidence panel, and a project-specific postmortem structure.

The chapter is deliberately diverse. It covers semantic search, language-guided navigation, manipulation, VLA adaptation, locomotion transfer, world-model planning, safety shielding, language planning, drones, multi-agent rescue, open-ended research, and application-track templates. Together they show how the same evidence discipline survives across very different embodiments.

Prerequisites

Readers should be comfortable with Python, tensors, and the perception-action loop. When the chapter uses geometry, control, or probability, the relevant appendices provide a compact refresher.

Chapter Roadmap

59.1 Object search in a simulated homeBuilds a semantic-search project with stopping rules, replay suites, and failure clustering.
59.2 Language-guided navigation with replanningTurns instruction following into a grounded replanning benchmark with constraint-aware evaluation.
59.3 Vision-based robotic pick-and-place (IL + RL)Stages imitation and reinforcement learning as a robustness-improvement story rather than a buzzword pair.
59.4 Fine-tune an open VLA on a custom task (LeRobot)Shows how to adapt an open foundation policy while preserving dataset and evaluation discipline.
59.5 Learned locomotion with sim-to-real analysisCenters the transfer gap and forces matched simulation-to-hardware evidence.
59.6 World-model-based planning agentConverts latent prediction into a short-horizon planning capstone with drift diagnostics.
59.7 Safety-shielded embodied agentGrades safety by intervention quality, false alarms, and task completion rather than slogans.
59.8 LLM-based household task plannerSeparates language planning from grounding, affordances, and execution traces.
59.9 Drone inspection plannerTreats route planning, safety reserve, and coverage as one mission-level evidence problem.
59.10 Multi-agent search and rescueExamines communication, role allocation, and stale-information failure in a team setting.
59.11 Open-ended research projectProvides a scoping protocol so student ideas become falsifiable embodied experiments.
59.12 Application Track Capstone TemplatesPackages household, drone, driving, and humanoid tracks into one reusable evidence schema.

Tooling Note

This chapter uses the right-tool principle. Build the mechanism once, then reach for maintained tools such as MuJoCo, MJX, Isaac Lab, Genesis, Newton, Drake, ROS 2, and modern Gazebo when the task moves from learning exercise to working system.

Hands-On Lab: Draft a Capstone Evidence Card

Duration: about 60 minutesDifficulty: Intermediate

Objective

Choose one capstone track from the chapter and produce a complete evidence card before writing implementation code.

Steps

Select one project track and state its operating domain, action interface, and safety constraint.
Name the baseline, the maintained-tool route, and one perturbation panel.
Specify the metric script, replay artifact, and failure taxonomy.
Present one likely failure case and the experiment that would clarify it.
Use the card as the first page of the project repository or proposal.

What's Next?

Continue with Section 59.1: Object search in a simulated home, where the chapter moves from motivation to the first concrete idea.

This chapter is written for readers who want projects that can survive review, grading, and future extension. Read each section twice: first for the system idea, then for the repository artifact that would let a teammate rerun and critique the project later.

Chapter Tool Map

Tool or Library	Where It Pays Off
Habitat and AI2-THOR	Use for semantic search, household navigation, and instruction-following capstones.
ManiSkill, robomimic, and LeRobot	Use for manipulation, imitation, and VLA adaptation projects.
Isaac Lab, MuJoCo, and locomotion stacks	Use for control-heavy projects and sim-to-real analysis.
PX4 SITL, CARLA, and CommonRoad	Use for aerial and autonomous-driving mission planning tracks.
ROS 2 plus replay and logging tools	Use across all tracks to keep evidence, safety events, and postmortems inspectable.

Chapter Lab Extension

Extend the evidence card by adding a directory layout, one README grading checklist, and one explicit note explaining why the chosen baseline could plausibly beat the proposed method.

Reader Outcomes And Assessment Pattern

The chapter converts the book into a capstone studio. By the end, the reader should be able to scope a tractable embodied project, choose the right maintained stack, define a same-panel evaluation, and write a postmortem that distinguishes model failure from systems failure.

Chapter Production Checklist

Dimension	What The Reader Produces	Quality Gate
Mechanism	A concise explanation of the loop component changed by capstone design.	The explanation names observation, state, action, and feedback.
Implementation	A baseline plus a maintained-tool route using Gymnasium, Habitat, ManiSkill.	The two routes save the same artifact schema.
Evaluation	A same-panel metric comparison with perturbation and failure labels.	Numbers are co-computed in one run on one config.
Communication	A short postmortem that distinguishes concept, system, and evidence claims.	The postmortem includes one limitation and one next test.

Chapter Lab Frame

Run the chapter as a two-pass build. First, implement the smallest baseline that exposes the mechanism. Second, replace the brittle part with the maintained tool that preserves the same contract. The deliverable is a folder with code, config, logs, plots or traces, and labeled failures.

Bibliography & Further Reading

Foundational Papers, Tools, and References

Sutton, R. S., and Barto, A. G.. "Reinforcement Learning: An Introduction." (2018). http://incompleteideas.net/book/the-book-2nd.html

A foundation for value functions, policy gradients, exploration, and the RL framing used throughout the book.

Todorov, E., Erez, T., and Tassa, Y.. "MuJoCo: A physics engine for model-based control." (2012). https://mujoco.org/

The simulator lineage behind much modern robot learning, now extended through MJX and Warp workflows.

Brohan, A. et al.. "RT-1: Robotics Transformer for real-world control at scale." (2022). https://arxiv.org/abs/2212.06817

A landmark in large-scale robot policy learning with transformer policies.

Brohan, A. et al.. "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." (2023). https://arxiv.org/abs/2307.15818

A central reference for connecting web-scale VLM knowledge to robot actions.

Open X-Embodiment Collaboration. "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." (2023). https://arxiv.org/abs/2310.08864

The cross-embodiment data and transfer reference used by the data chapters.

Chi, C. et al.. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." (2023). https://arxiv.org/abs/2303.04137

The practical diffusion policy reference for imitation learning and continuous action generation.

Hafner, D. et al.. "Mastering Diverse Domains through World Models." (2023). https://arxiv.org/abs/2301.04104

DreamerV3, a modern reference for latent world models and imagination-based control.

Hugging Face. "LeRobot." (2024). https://github.com/huggingface/lerobot

The open robot-learning stack used for datasets, policies, demos, and low-cost embodied AI workflows.

Official documentation and source repositories for Capstone Projects.

Use official docs to check install commands, current APIs, and version caveats before applying Capstone Projects in a lab or project.