Book Part
Part XI

Part XI: Evaluation, Safety, Robustness, and Deployment

Part Overview

This part covers metrics, uncertainty, safety filters, deployment architecture, and operational discipline. It connects formal ideas with the tools and labs needed to build working systems.

Chapters: 4. Each chapter includes theory, recipes, practical code, a library shortcut, and exercises.

Why This Part Matters

Evaluation, Safety, Robustness, and Deployment gives the reader a working layer of the embodied AI stack. Later chapters assume this layer when agents must perceive, plan, act, and recover from mistakes.

This chapter develops evaluating embodied systems as part of the embodied AI stack.

  • 52.1 Why accuracy is not enough
  • 52.2 Success rate, path efficiency, time and energy cost
  • 52.3 Safety violations and constraint satisfaction
  • 52.4 Robustness and generalization metrics
  • 52.5 Reproducible evaluation: SIMPLER and sim-as-proxy
  • 52.6 Real-world evaluation hygiene; benchmark design

This chapter develops robustness and uncertainty as part of the embodied AI stack.

  • 53.1 What goes wrong: sensor noise, distribution shift
  • 53.2 Model uncertainty and calibration
  • 53.3 Out-of-distribution detection
  • 53.4 Runtime monitoring and fail-safe behavior

This chapter develops safety in embodied AI as part of the embodied AI stack.

  • 54.1 Why embodied safety is different (physical harm)
  • 54.2 Constraint violations and safe exploration
  • 54.3 Control barrier functions and Hamilton-Jacobi reachability
  • 54.4 Shielded policies and safety filters
  • 54.5 Human override and safety testing
  • 54.6 Deployment approval and safety cases

This chapter develops deployment architecture as part of the embodied AI stack.

  • 55.1 From notebook to robot
  • 55.2 Real-time inference and control rates
  • 55.3 Edge vs. cloud-robot computation; asynchronous inference
  • 55.4 Logging, monitoring, model updates
  • 55.5 Failure recovery, security, maintenance

What's Next?

After this part, Part XII extends the stack with frontier problems and capstone builds.