Section 53.1: What goes wrong: sensor noise, distribution shift

A robust robot is not the one that never sees surprise, it is the one that notices surprise early enough to act differently.

A Runtime Monitoring Engineer
Big Picture

Robustness starts by refusing to call every failure 'brittleness'. Sensor noise, timestamp drift, actuation delay, novel objects, and environment shift have different signatures and different fixes.

What goes wrong: sensor noise, distribution shift illustration for Chapter 53.
Figure 53.1.1: A disturbance map separates noise, occlusion, latency, and full distribution shift so the repair path becomes explicit.

Why This Matters

What goes wrong: sensor noise, distribution shift is useful only when it distinguishes disturbance sources and ties them to specific corrective actions. Robustness is not one scalar, it is a map from perturbation class to degraded behavior, detection delay, and residual risk.

A simple disturbance decomposition is $$y_t = h(x_t) + \epsilon_t, \qquad x_t \sim p_{train}(x) \;\text{or}\; p_{deploy}(x),$$ where $\epsilon_t$ captures observation corruption and the change from $p_{train}$ to $p_{deploy}$ captures distribution shift. Different corrective actions target these two terms.

Key Insight

If the disturbance source is mislabeled, the mitigation often makes the system worse. You do not fix missing depth frames with more policy regularization, and you do not fix unseen object classes with a timestamp smoother.

Algorithmic View
  1. Label perturbations by channel: observation corruption, state-estimation drift, action delay, or environment shift.
  2. Measure outcome degradation under each channel separately before composing them.
  3. Record whether the first visible symptom appears in perception, state estimation, planning, or control.
  4. Attach each failure to a replay artifact with the disturbance label in metadata.
  5. Choose mitigations after the disturbance label is stable across multiple episodes.

Worked Example

A mobile robot that misses docking targets under motion blur needs either sensor robustness or slower approach speeds. The same failure under dropped timestamps points toward synchronization, not representation learning.

disturbances = [
    {"label": "motion_blur", "success": 0, "state_error_cm": 4.2},
    {"label": "depth_dropout", "success": 0, "state_error_cm": 13.9},
    {"label": "novel_texture", "success": 1, "state_error_cm": 6.4},
]

summary = {}
for row in disturbances:
    summary[row["label"]] = {
        "success": row["success"],
        "state_error_cm": row["state_error_cm"],
    }

print(summary)
{'motion_blur': {'success': 0, 'state_error_cm': 4.2}, 'depth_dropout': {'success': 0, 'state_error_cm': 13.9}, 'novel_texture': {'success': 1, 'state_error_cm': 6.4}}
Code Fragment 53.1.1 records disturbance labels beside outcome and state error, making channel-specific diagnosis possible.

Expected output: The output shows that two failures share task failure but not the same internal signature. The depth-dropout episode has a far larger state error, which points toward a different repair path.

Library Shortcut

Observation wrappers, ROS diagnostics, and sensor-injection utilities make it easy to build repeatable perturbation panels once the disturbance taxonomy is defined clearly.

Concrete stack anchors for this chapter include Albumentations or custom disturbance wrappers for controlled perturbations, Torchmetrics and scikit-learn for calibration analysis, MAPIE or related conformal wrappers for thresholding, PyOD-style OOD baselines for score comparison, and Prometheus or OpenTelemetry for deployment-time health traces.

Embodied robustness work improves when failure categories are causal rather than cosmetic. Label the disturbance by what physically changed, not only by how the image looked or whether the policy failed.

The most common mistake is to aggregate all shifted scenes into one bucket. That hides whether the problem is sensor corruption, timing, morphology mismatch, or a new semantic object class.

Cross-References

This section ties back to Section 52.4 on perturbation metrics and leads to Section 53.2 on calibration and Section 53.3 on OOD detection.

Lab Recipe

Create a disturbance panel with at least three channels, such as motion blur, missing depth frames, and unseen textures. For each failed rollout, record which channel was active and which internal variable drifted first.

Failure Mode

Do not call a disturbance distribution shift if it is really a logging or synchronization bug. Robustness experiments become misleading when infrastructure failures are mislabeled as model limitations.

Practical Example

For autonomous driving, rain may degrade perception while route closure introduces semantic shift. For drones, wind gusts act through dynamics while glare acts through sensing. The diagnostic matrix should reflect that distinction.

Research Frontier

The field is starting to use richer perturbation ontologies and automatically generated stress suites, but good causal labeling still requires careful human judgment and replay review.

Self Check

Can you name one perturbation that primarily affects sensing and one that primarily affects dynamics? If not, the disturbance taxonomy is still too flat.

Key Takeaway

Robustness work begins with disturbance taxonomy. Different failure channels deserve different measurements and different fixes.

Exercise 53.1.1

Take a recent embodied failure from your own work and relabel it by disturbance channel. Then propose one experiment that would falsify your diagnosis.

Section References

Kendall, A., and Gal, Y. "What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?" (2017). https://arxiv.org/abs/1703.04977

Useful background for how disturbance channels interact with uncertainty types.

Official ROS 2 diagnostics documentation.

Practical support for surfacing sensor-health and timing signals at runtime.

What's Next

Section 53.2 asks how the robot should represent uncertainty once the disturbance source has been identified.