"Deployment is where training becomes a subscription service with consequences."
A Fielded Policy Reading Its Logs
Learning after deployment means separating monitoring, candidate update, validation, and rollout so that adaptation remains controlled and auditable.
The update pipeline is part of the embodied system. If the learning loop cannot be audited, then post-deployment improvement is operating outside the same scientific standard demanded of perception, planning, and control.
Theory
Post-deployment learning should be modeled as a governed pipeline:
$$D_t \rightarrow U_t \rightarrow \theta_{t+1} \rightarrow V_t \rightarrow R_t,$$
where collected field data $D_t$ feeds an update rule $U_t$, producing candidate parameters $\theta_{t+1}$, which are evaluated by validation suite $V_t$ before rollout decision $R_t$. The key idea is that deployment data and deployment decisions are connected, but not collapsed into one uncontrolled online loop.
| Stage | Artifact | Failure If Missing |
|---|---|---|
| Monitoring | drift and intervention report | no justified reason to update |
| Candidate update | versioned training config and data slice | cannot explain what changed |
| Validation | old-task, new-task, and safety panel | silent regressions |
| Rollout | shadow or canary decision log | unsafe direct promotion |
| Rollback | pointer to previous stable version | slow or impossible recovery |
Worked Example
Suppose a shelf-picking robot sees more reflective packaging after a supplier change. The new data may justify a candidate perception update, but only after the system verifies that prior carton and bottle skills still work.
def validate_update_pipeline(payload: dict[str, object]) -> dict[str, object]:
assert payload, "payload must not be empty"
return payload
update_pipeline = {
"field_signal": "drop in grasp success on reflective cartons",
"candidate_update": "fine-tune perception head on corrected examples",
"validation_panel": ["old carton tasks", "new reflective carton tasks", "safety replay set"],
"rollout_mode": "shadow_then_canary",
}
print(validate_update_pipeline(update_pipeline))
{'field_signal': 'drop in grasp success on reflective cartons', 'candidate_update': 'fine-tune perception head on corrected examples', 'validation_panel': ['old carton tasks', 'new reflective carton tasks', 'safety replay set'], 'rollout_mode': 'shadow_then_canary'}The expected output should make the release logic explicit. If the update pipeline does not name retained-task checks and rollout mode, then "learning after deployment" is really just ungoverned retraining.
- Detect a field signal such as drift or recurring intervention.
- Create a candidate update from labeled data, replay, or adapters.
- Evaluate old tasks, new tasks, and safety cases in one fixed panel.
- Deploy only in shadow or canary mode first.
- Promote or roll back according to explicit thresholds.
Replay stores, versioned experiment trackers, adapter-tuning libraries, and deployment registries are valuable here because they preserve provenance. The shortcut is helpful only when the tool chain retains old-task panels, candidate lineage, and rollback pointers rather than just storing a new checkpoint.
Teams often let new field data dominate the update pipeline without preserving enough old-task evidence. The result is adaptation that looks good on the latest problem and quietly regresses earlier competence.
A hospital delivery robot that sees new floor reflections after waxing may need a localization update. A governed pipeline first labels the reflective cases, then checks retained hallway navigation and elevator entry, then promotes the update in shadow mode before granting live control authority.
An open problem is how to merge large-scale self-supervised field data with strict embodied safety gates. The tension is between making use of abundant unlabeled experience and keeping update authority narrow enough to avoid hidden regressions.
Can you name the field signal, candidate update, retained-task panel, and rollout mode for one real robot application? If any of those are missing, the learning loop is still underspecified.
In production systems, the update object is usually larger than a checkpoint. It should include the exact slice of field data, labeling protocol, adapter or fine-tuning configuration, validation manifest, and the deployment ticket that authorized the shadow or canary run. Tools such as PyTorch training jobs, Weights and Biases or TensorBoard traces, and ROS 2 replay logs are useful here only when they preserve this bundle as one inspectable release artifact rather than scattering evidence across unrelated dashboards.
A strong post-deployment recipe also distinguishes perception adaptation from control adaptation. If a warehouse robot fails because carton appearance changed, the first candidate may be a narrow vision update with frozen planner and controller interfaces. If the failure instead comes from timing drift or changed vehicle dynamics, the update path may involve different evaluation panels, different rollback rules, and stricter closed-loop replay before any canary release.
In serious deployments, learning after deployment also changes organizational interfaces. Operators, data curators, and release owners need a shared artifact vocabulary so that a model update can be challenged and reversed without ambiguity.
Learning after deployment is a governed release process, not a permission slip for uncontrolled online adaptation.
Design a field-learning pipeline for a mobile robot whose localization degrades in reflective hallways. Name the field signal, candidate update, retained-task panel, and rollout mode.
Section References
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. PNAS, 2017.
Use for regularization-based retention and its assumptions.
Lopez-Paz, D. and Ranzato, M. Gradient Episodic Memory for Continual Learning. NeurIPS, 2017.
Use for replay-constrained updates and task-stream evaluation.
What's Next?
Next, continue with Section 57.2, where the main technical risk becomes catastrophic forgetting.