Section 57.1: Learning after deployment | Building Embodied AI: From Perception to Autonomous Action

"Deployment is where training becomes a subscription service with consequences."
A Fielded Policy Reading Its Logs

Technical illustration for Section 57.1: Learning after deployment. — **Figure 57.1A**: Post-deployment learning must be governed like a release pipeline, not treated as spontaneous self-improvement.

Big Picture

Learning after deployment means separating monitoring, candidate update, validation, and rollout so that adaptation remains controlled and auditable.

Key Insight

The update pipeline is part of the embodied system. If the learning loop cannot be audited, then post-deployment improvement is operating outside the same scientific standard demanded of perception, planning, and control.

Theory

Post-deployment learning should be modeled as a governed pipeline:

$$D_t \rightarrow U_t \rightarrow \theta_{t+1} \rightarrow V_t \rightarrow R_t,$$

where collected field data $D_t$ feeds an update rule $U_t$, producing candidate parameters $\theta_{t+1}$, which are evaluated by validation suite $V_t$ before rollout decision $R_t$. The key idea is that deployment data and deployment decisions are connected, but not collapsed into one uncontrolled online loop.

Post-Deployment Learning Pipeline

Stage	Artifact	Failure If Missing
Monitoring	drift and intervention report	no justified reason to update
Candidate update	versioned training config and data slice	cannot explain what changed
Validation	old-task, new-task, and safety panel	silent regressions
Rollout	shadow or canary decision log	unsafe direct promotion
Rollback	pointer to previous stable version	slow or impossible recovery

Worked Example

Suppose a shelf-picking robot sees more reflective packaging after a supplier change. The new data may justify a candidate perception update, but only after the system verifies that prior carton and bottle skills still work.

def validate_update_pipeline(payload: dict[str, object]) -> dict[str, object]:
    assert payload, "payload must not be empty"
    return payload

update_pipeline = {
    "field_signal": "drop in grasp success on reflective cartons",
    "candidate_update": "fine-tune perception head on corrected examples",
    "validation_panel": ["old carton tasks", "new reflective carton tasks", "safety replay set"],
    "rollout_mode": "shadow_then_canary",
}
print(validate_update_pipeline(update_pipeline))

{'field_signal': 'drop in grasp success on reflective cartons', 'candidate_update': 'fine-tune perception head on corrected examples', 'validation_panel': ['old carton tasks', 'new reflective carton tasks', 'safety replay set'], 'rollout_mode': 'shadow_then_canary'}

Code Fragment 57.1.1 expresses post-deployment learning as a controlled update pipeline.

The expected output should make the release logic explicit. If the update pipeline does not name retained-task checks and rollout mode, then "learning after deployment" is really just ungoverned retraining.

Algorithm: Governed Field Learning

Detect a field signal such as drift or recurring intervention.
Create a candidate update from labeled data, replay, or adapters.
Evaluate old tasks, new tasks, and safety cases in one fixed panel.
Deploy only in shadow or canary mode first.
Promote or roll back according to explicit thresholds.

Library Shortcut

Replay stores, versioned experiment trackers, adapter-tuning libraries, and deployment registries are valuable here because they preserve provenance. The shortcut is helpful only when the tool chain retains old-task panels, candidate lineage, and rollback pointers rather than just storing a new checkpoint.

Common Failure Mode

Teams often let new field data dominate the update pipeline without preserving enough old-task evidence. The result is adaptation that looks good on the latest problem and quietly regresses earlier competence.

Practical Example

A hospital delivery robot that sees new floor reflections after waxing may need a localization update. A governed pipeline first labels the reflective cases, then checks retained hallway navigation and elevator entry, then promotes the update in shadow mode before granting live control authority.

Research Frontier

An open problem is how to merge large-scale self-supervised field data with strict embodied safety gates. The tension is between making use of abundant unlabeled experience and keeping update authority narrow enough to avoid hidden regressions.

Self Check

Can you name the field signal, candidate update, retained-task panel, and rollout mode for one real robot application? If any of those are missing, the learning loop is still underspecified.

In production systems, the update object is usually larger than a checkpoint. It should include the exact slice of field data, labeling protocol, adapter or fine-tuning configuration, validation manifest, and the deployment ticket that authorized the shadow or canary run. Tools such as PyTorch training jobs, Weights and Biases or TensorBoard traces, and ROS 2 replay logs are useful here only when they preserve this bundle as one inspectable release artifact rather than scattering evidence across unrelated dashboards.

A strong post-deployment recipe also distinguishes perception adaptation from control adaptation. If a warehouse robot fails because carton appearance changed, the first candidate may be a narrow vision update with frozen planner and controller interfaces. If the failure instead comes from timing drift or changed vehicle dynamics, the update path may involve different evaluation panels, different rollback rules, and stricter closed-loop replay before any canary release.

In serious deployments, learning after deployment also changes organizational interfaces. Operators, data curators, and release owners need a shared artifact vocabulary so that a model update can be challenged and reversed without ambiguity.

Key Takeaway

Learning after deployment is a governed release process, not a permission slip for uncontrolled online adaptation.

Exercise 57.1.1

Design a field-learning pipeline for a mobile robot whose localization degrades in reflective hallways. Name the field signal, candidate update, retained-task panel, and rollout mode.

Section References

Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. PNAS, 2017.

Use for regularization-based retention and its assumptions.

Lopez-Paz, D. and Ranzato, M. Gradient Episodic Memory for Continual Learning. NeurIPS, 2017.

Use for replay-constrained updates and task-stream evaluation.

What's Next?

Next, continue with Section 57.2, where the main technical risk becomes catastrophic forgetting.