"An agent becomes interesting at the exact moment perception changes what it dares to do next."
A Patient Embodied AI Agent
Localization and Mapping (SLAM) turns perception into action-ready state. A robot that does not know where it is will turn every good plan into a guess. SLAM is the discipline of making that guess explicit, updateable, and testable.
The durable test is not whether a model looks impressive. The test is whether it improves a robot's next action while leaving a clear evidence trail for debugging.
Chapter Overview
Chapter 29 develops Localization and Mapping (SLAM) as a working piece of the embodied AI stack. It connects visual or spatial evidence to state estimates, action choices, visual servoing loops, timing budgets, and failure labels.
The chapter follows the right-tool rhythm used across the book: build the mechanism once, then move to maintained tools such as OpenCV, Open3D, ROS 2, Nav2.
Prerequisites
Readers should be comfortable with Python, tensors, coordinate frames, sensor noise, and the perception-action loop. Useful refreshers appear in Chapter 4, Chapter 8, and Chapter 13.
Chapter Roadmap
- 29.1 Where am I and what does the world look likelocalization and mapping are coupled because every pose estimate affects the map and every map update affects later pose estimates.
- 29.2 Odometry and dead reckoningodometry integrates motion increments, so small bias compounds into large pose error.
- 29.3 Localization (Monte Carlo / particle filters)particle filters represent pose belief as many weighted hypotheses.
- 29.4 Mapping and occupancy gridsmapping converts sensor rays and poses into a belief over free, occupied, and unknown cells.
- 29.5 SLAM: graph-based and visual SLAMgraph SLAM turns poses and landmarks into constraints, then solves for the trajectory and map that best satisfy them.
- 29.6 Neural and Gaussian-splat SLAMneural and Gaussian representations fold appearance into mapping, offering dense reconstructions for view synthesis and robot inspection.
- 29.7 Map uncertaintyuncertainty is a planning signal, not an afterthought.
- 29.8 Modern SLAM Systems And Failure ModesModern SLAM is no longer one algorithm. It is a contract among inertial sensing, visual or lidar front ends, factor-graph optimization, map maintenance, semantic structure, and failure replay.
This chapter uses the right-tool principle. The teaching baseline exposes units, frames, uncertainty, and logging. The shortcut stack uses maintained tools to handle optimized kernels, visualization, data formats, simulation hooks, and deployment interfaces.
Hands-On Lab: Build A Localization and Mapping (SLAM) Evidence Panel
Objective
Build a small evidence panel that compares a hand-built baseline with a maintained tool workflow for this chapter.
What You'll Practice
- Writing an observation, action, metric, and perturbation contract.
- Building one inspectable baseline before using a library shortcut.
- Logging success, failure labels, latency, and recovery behavior.
- Explaining which result would change a robot action.
Setup
Use a Python environment with NumPy. Add chapter-specific tools only after the baseline manifest runs.
# Create a small local environment for the chapter lab.
python -m pip install numpySteps
Step 1: Define The Contract
Write observation, action, metric, and perturbation fields for two sections.
Step 2: Run The Baseline Manifest
Create one comparable row per section, then fill realistic values from the section text.
# Start a Chapter 29 evidence manifest.
# Add one row per section and keep metrics construct matched.
sections = ['29.1', '29.2', '29.3', '29.4', '29.5', '29.6', '29.7']
manifest = [
{"section": s, "metric": "closed_loop_success", "perturbation": "occlusion_or_noise"}
for s in sections
]
print(manifest[0])Step 3: Add The Library Shortcut
Replace one baseline field with a maintained tool call, while keeping the output schema unchanged.
Step 4: Run One Perturbation
Add occlusion, noise, pose drift, map error, or goal ambiguity. Record whether the action changed.
Step 5: Write The Postmortem
Explain the strongest result, the most informative failure, and the next diagnostic test.
Expected Output
A table with one row per tested section, one baseline result, one shortcut result, one perturbation label, and one failure label.
Stretch Goals
- Add a plot of metric versus perturbation strength.
- Run the same manifest in a Habitat-style simulator or ROS 2 bag replay.
- Export two failure cases with enough metadata to reproduce them later.
Complete Solution
# Start a Chapter 29 evidence manifest.
# Add one row per section and keep metrics construct matched.
sections = ['29.1', '29.2', '29.3', '29.4', '29.5', '29.6', '29.7']
manifest = [
{"section": s, "metric": "closed_loop_success", "perturbation": "occlusion_or_noise"}
for s in sections
]
print(manifest[0])
for row in manifest:
row["baseline_score"] = 0.72
row["shortcut_score"] = 0.81
row["failure_label"] = "perception_or_planning_interface"
print(manifest)Use this chapter as a complete teaching unit: concept, minimal implementation, library shortcut, diagnostic perturbation, and postmortem. The pattern prevents a perception model from being evaluated in isolation and never tested as part of the agent loop.
| Tool or Library | Where It Pays Off |
|---|---|
| OpenCV | Use when it shortens the path from mechanism to reproducible embodied evidence. |
| Open3D | Use when it shortens the path from mechanism to reproducible embodied evidence. |
| ROS 2 | Use when it shortens the path from mechanism to reproducible embodied evidence. |
| Nav2 | Use when it shortens the path from mechanism to reproducible embodied evidence. |
| GTSAM | Use when it shortens the path from mechanism to reproducible embodied evidence. |
| ORB-SLAM style pipelines | Use when it shortens the path from mechanism to reproducible embodied evidence. |
| Gaussian Splatting SLAM systems | Use when it shortens the path from mechanism to reproducible embodied evidence. |
Before leaving the chapter, the reader should be able to state one theory claim, one implementation claim, one evaluation claim, and one realistic failure mode.
A strong chapter session ends with an artifact: a script, trace, simulator run, data card, map, or reproducible evaluation panel.
What's Next?
Start with Section 29.1: Where am I and what does the world look like. After this chapter, continue to Chapter 30: Navigation and Path Planning.
Bibliography & Further Reading
Foundational Papers, Tools, and References
Durrant-Whyte, H. and Bailey, T.. "Simultaneous Localization and Mapping." IEEE Robotics and Automation Magazine, 2006. https://ieeexplore.ieee.org/document/1638022
A classic tutorial framing of SLAM's estimation problem and uncertainty structure.
Mur-Artal, R. and Tardos, J. D.. "ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras." IEEE T-RO, 2017. https://arxiv.org/abs/1610.06475
A widely used reference for feature-based visual SLAM pipelines.
Dellaert, F.. "Factor Graphs and GTSAM." Project documentation. https://gtsam.org/
A practical reference for factor graphs, pose graphs, and nonlinear smoothing.
ROS 2 Navigation. "Nav2 documentation." Project documentation. https://navigation.ros.org/
The maintained navigation stack that connects maps, localization, planners, controllers, and recovery behaviors.