Safety is the discipline of deciding which actions your system is never allowed to improvise.
A Runtime Safety Architect
Chapter 54 moves from generic caution to explicit safety engineering. Hazards, constraints, barrier functions, shields, overrides, and assurance arguments are treated as first-class parts of the embodied stack.
Safety claims are credible only when they define the operating domain, the forbidden states and actions, the intervention authority, and the evidence that shows the system respected those limits.
Chapter Overview
This chapter builds a layered view of safety: hazard analysis, constrained optimization, formal safety envelopes, runtime shields, human override, and deployment approval. The goal is to show how learned policies fit inside a larger assurance architecture.
The theory thread spans constrained Markov decision processes, control barrier functions, Hamilton-Jacobi reachability, and assurance cases. The implementation thread focuses on monitors, intervention latency, override protocols, and release gates.
This chapter keeps a research-grade standard throughout: every promoted claim should be tied to one matched panel, one artifact bundle, and one replay path that lets another team inspect what changed in the closed loop.
Prerequisites
Readers should know basic control, uncertainty-aware evaluation, and deployment logging. Chapter 37 on controllers, Chapter 52 on evaluation, and Chapter 53 on robustness are the main prerequisites.
Chapter Roadmap
- 54.1 Why embodied safety is different (physical harm)Translate model errors into hazards, severity, exposure, and controllability.
- 54.2 Constraint violations and safe explorationLearn under constraints without treating violations as ordinary exploration noise.
- 54.3 Control barrier functions and Hamilton-Jacobi reachabilityCompute formal safe sets and action corrections around a learned policy.
- 54.4 Shielded policies and safety filtersPlace runtime supervisors between policy output and actuator command.
- 54.5 Human override and safety testingDesign intervention interfaces and evidence-producing test campaigns.
- 54.6 Deployment approval and safety casesBuild release gates, residual-risk narratives, and operational restrictions.
- 54.7 Safety Cases And Assurance Arguments For Embodied AITurn the whole chapter into a structured assurance artifact.
Use small hand-built examples to make the constraints visible, then move to CBF solvers, reachability tools, hazard logs, runtime supervisors, and assurance templates. The shortcut matters because safety logic must be inspectable, testable, and maintainable under change.
The chapter's practical standard is simple: use tools that preserve provenance, timestamps, intervention traces, and replay links. A shorter script is only an advantage when the evidence chain stays intact.
Hands-On Lab: Build the Evaluation Stack
Objective
Wrap one embodied policy with a simple safety monitor, a safety filter, and an operator override path. Produce one release dossier containing the task envelope, hazard log, test evidence, and residual risk summary.
Steps
- Write an operational design domain or site card before policy tuning begins.
- Specify at least one hard state constraint, one action-rate constraint, and one emergency-stop path.
- Implement a simple safety filter or barrier-style correction around the nominal controller.
- Run nominal, degraded-sensing, near-boundary, and override tests on the same panel.
- Assemble a safety case summary that names evidence, defeaters, and residual risks.
What's Next?
Continue with Section 54.1: Why embodied safety is different (physical harm), where the chapter starts with hazards rather than metrics.
Read this chapter from the outside in: start with the operating domain, then inspect the forbidden states and actions, then move inward to the learning or planning algorithm. That order reflects how real safety reviews are run.
A credible safety artifact links hazard analysis, intervention authority, runtime monitoring, and replayable test evidence. Missing any one of these usually means the claimed safety property is only conceptual.
When reading or teaching the chapter, insist on one more question after every result: which files would another researcher need in order to reproduce, challenge, or extend this exact conclusion without guessing hidden protocol details?
| Tool or Library | Where It Pays Off |
|---|---|
| Hazard logs and FMEA tables | Track hazards, causes, mitigations, owners, and verification evidence. |
| CBF or QP-based safety filters | Project unsafe actions back into an admissible control set. |
| Reachability tooling | Approximate safe sets and worst-case envelopes for simplified dynamics. |
| ROS 2 lifecycle and safety nodes | Route intervention logic outside ordinary policy code paths. |
| Assurance case templates | Tie claims, evidence, and defeaters into a reviewable release dossier. |
Extend the lab by asking a second reader to challenge the safety case with a new defeater. If the evidence folder cannot answer the challenge, the assurance argument is still immature.
Students often overfocus on the learning algorithm. Repeatedly bring them back to action authority: who can veto a command, how quickly, and based on which signals.
The best projects in this chapter are small but rigorous. A simple robot with a real hazard log and a tested override path teaches more than a large policy with no safety evidence.
Each chapter in this part should end with a dossier, not only a plot: configuration, panel definition, metric script, synchronized logs, replay artifact, failure taxonomy, and a short statement of residual uncertainty or residual risk.
A strong seminar or design review should ask four questions at the chapter boundary: what exactly was frozen, what evidence would falsify the claim, which tool preserves the audit trail, and which residual risk or uncertainty still remains after the best current mitigation is applied.
Before leaving the chapter, the reader should be able to define an operating domain, map hazards to mitigations, write a barrier-style safety condition, explain shield behavior, and assemble an assurance argument.
Safety in embodied AI is not a final patch. It is the architecture that constrains action, routes intervention, and turns deployment into a reviewable evidence package.
Bibliography & Further Reading
Foundational Papers, Tools, and References
Ames, A. D. et al. "Control Barrier Function Based Quadratic Programs for Safety Critical Systems." (2017). https://arxiv.org/abs/1609.06408
A core reference for barrier-function-based safety filters.
Fisac, J. F. et al. "General Safety and Control of Autonomous Systems: A Hamilton-Jacobi Reachability-Based Approach." (2019). https://arxiv.org/abs/1810.07406
Useful for safe set reasoning and worst-case analysis.
Wabersich, K. P. et al. "Safe Reinforcement Learning Using Probabilistic Shields." (2023). https://arxiv.org/abs/2210.00746
A modern reference on shielding strategies around learned policies.
UL 4600 overview and related assurance guidance.
A practical anchor for release gates and structured safety cases.