Section 27.7: When perception failures become action failures | Building Embodied AI: From Perception to Autonomous Action

"A perception failure is understood only when its downstream action failure is named."
A Patient Embodied AI Agent

Big Picture

When perception failures become action failures maps visual errors to downstream consequences. The goal is not merely to say that perception failed, but to identify whether calibration, recognition, uncertainty, latency, tracking, or the action interface caused the robot to choose badly.

Problem First: Why This Representation Exists

The section treats perception bugs as system bugs. A wrong label, stale mask, scale error, or delayed estimate matters because it changes a trajectory, grasp, stop decision, or recovery behavior.

The contract here maps perception evidence to failure attribution: raw input, intermediate representation, action consumer, chosen command, observed failure, and the earliest detectable warning.

Action Is The Unit Of Meaning

A failure taxonomy earns value when it tells the team whether to fix sensing, calibration, uncertainty propagation, planning assumptions, or controller safeguards.

Figure 27.7.1 shows this section's perception-to-action contract. Read each edge as a concrete interface that must name units, frame, timestamp, uncertainty, and the consumer that is allowed to act on it.

Figure 27.7.1: Failure propagation from perception to action. The dashed feedback path reminds the reader that perception quality is judged by action consequences and replayable diagnostics.

Mathematical Core

A perception error matters when it crosses an action boundary or consumes the available timing margin.

Formal Object

$\mathrm{fail}= \mathbf 1[d(\hat s,s)>\epsilon_{\mathrm{action}}]\lor \mathbf 1[\Delta t>\Delta t_{\max}]\lor \mathbf 1[\Sigma_{\hat s}\not\subseteq \Sigma_{\mathrm{allowed}}]$

This expression separates magnitude error, latency error, and uncertainty-interface error. A small estimate error can be harmless far from a decision boundary; the same error can be catastrophic near contact.

Perception failure triage

Replay the raw sensor stream and verify calibration, timestamps, and transforms.
Compare model output with a task-level counterfactual action.
Check whether uncertainty was published and consumed by the planner or controller.
Assign the failure to sensing, representation, timing, action selection, control, or evaluation.

Failure Labels That Preserve Debugging Value

Design Choice	Use When	Control Risk
Sensing failure	Blur, glare, missing depth, dropped frames	Bad raw evidence enters every downstream module.
Representation failure	Wrong mask, pose, flow, or affordance	Planner receives a plausible but false state.
Interface failure	No uncertainty, wrong frame, stale timestamp	Correct perception is consumed incorrectly.

Worked Miniature

Code Fragment 27.7.1 classifies failures by comparing state error, latency, and uncertainty width against action thresholds. This is the kind of small rule that should appear in replay dashboards.

# Label whether perception crossed an action-relevant failure boundary.
# Separate geometry error, latency error, and uncertainty-interface error.
state_error_m = 0.045
action_margin_m = 0.030
latency_ms = 115
max_latency_ms = 80
uncertainty_m = 0.055
allowed_uncertainty_m = 0.040

labels = []
if state_error_m > action_margin_m:
    labels.append("geometry_error")
if latency_ms > max_latency_ms:
    labels.append("stale_perception")
if uncertainty_m > allowed_uncertainty_m:
    labels.append("uncertainty_too_wide")
print(labels)

['geometry_error', 'stale_perception', 'uncertainty_too_wide']

This expected output list means the failure is multi-causal, so retraining one vision model would not close the loop by itself. Each label names a different intervention path: recalibrate or refit geometry, reduce latency, or widen the action margin under uncertainty.

Code Fragment 27.7.1: The three thresholds create distinct failure labels instead of one vague perception failure. `geometry_error`, `stale_perception`, and `uncertainty_too_wide` each point to a different fix path.

Library Shortcut

A production pipeline should emit these labels from ROS 2 diagnostics, tracing tools, and model telemetry. Frameworks can collect timestamps and message metadata automatically, but the team must define the action boundary and failure taxonomy.

Failure Mode To Test

The worst failure label is `bad vision`. It hides the specific interface that broke and makes the next experiment less informative.

Practical Example

When an autonomous vehicle brakes late, the audit should separate missed detection, wrong object velocity, delayed perception, planner threshold, and actuator response. Only one of those is solved by retraining a detector.

Memory Hook

For When perception failures become action failures, the perception result must answer what action changed, what uncertainty changed, and what log would reproduce the decision. Otherwise the output is still visualization, not embodied evidence.

Debugging And Evaluation

Evaluate failure cases with replayable causal records: record sensor stream, perception output, uncertainty, planner input, command, physical outcome, and the smallest counterfactual check.

Perturb one suspected cause at a time, such as calibration, latency, recognition, or tracking, then verify whether the same downstream action failure appears.

Research Frontier

As perception stacks absorb foundation models, multimodal prompts, and learned world models, failure attribution becomes more important. The frontier is building evaluation artifacts that reveal when a model was wrong, when it was late, and when downstream code ignored its uncertainty.

What's Next

Chapter 28 lifts everything learned here into three dimensions: the same failure-attribution discipline now applies to point clouds, voxel maps, and neural scene representations, where geometry errors propagate to collision and contact decisions rather than to class labels.

Section References

NVIDIA. Isaac ROS Visual SLAM documentation. https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_visual_slam/index.html

Illustrates real-time perception components whose odometry output must be monitored for latency and reliability.

OpenCV. Camera calibration and 3D reconstruction documentation. https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html

Calibration failures are a frequent root cause of action-level perception failures.

Self Check

Can you name the representation, the consuming action, the uncertainty or freshness field, and the failure label for When perception failures become action failures? If any one is missing, the section is not yet ready for a robot replay log.

Key Takeaway

A perception failure becomes useful engineering evidence only after it is mapped to the action boundary it crossed and the interface that allowed it through.

Exercise 27.7.1

Take a failed robot rollout and assign three labels: first bad signal, first bad state estimate, and first bad action. Explain how the fix differs for each label.