For From notebook to robot, deployment quality is measured by the command stream, safety monitor state, and replayable evidence behind each command.
A Careful Control Loop
From notebook to robot matters because deployment architecture turns an experiment into a timed, observable, recoverable system. The section treats evaluation, uncertainty, safety, and deployment as one closed-loop contract rather than as separate checklist items.
Problem First
A notebook hides process boundaries, start-up races, stale topics, missing calibration, device contention, and undeclared operator assumptions. The robot experiences all of them concurrently, so deployment is a systems-identification step as much as a packaging step.
The practical question is therefore sharper than "does the model work?" It is: which sensors and clocks define the current state estimate, which action interface is authoritative, which monitor may override it, and which artifact proves that the resulting behavior is acceptable under perturbation?
Before motion is enabled, name the observation schema, state estimator owner, action rate, timeout, fallback state, and rollback condition. Without those fields, a successful demo is not yet a deployable result.
Theory
Deployment begins by splitting the system into a high-rate actuation loop, a perception and estimation path, a policy service, and a supervisory layer. The minimum timing constraint is
$$\tau_{\mathrm{sense}} + \tau_{\mathrm{queue}} + \tau_{\mathrm{infer}} + \tau_{\mathrm{publish}} \le T_{\mathrm{policy}}, \quad T_{\mathrm{policy}} \le k T_{\mathrm{ctrl}},$$
where $T_{\mathrm{ctrl}}$ is the low-level controller period and $k$ is the number of controller ticks for which a policy action may remain valid. Once that inequality is violated, the deployment problem is no longer "model quality" but stale-command control.
A useful evidence record is $e_i=(x_i,\hat s_i,a_i,m_i,\ell_i,z_i)$, where $x_i$ is the scenario context, $\hat s_i$ is the estimator state, $a_i$ is the issued command, $m_i$ is the monitor transition, $\ell_i$ is the latency vector, and $z_i$ is the artifact id. Every deployment claim in this chapter should be recoverable from one set of $e_i$ records.
The mechanism is startup, sense, estimate, decide, constrain, execute, observe, and recover. Each verb must have an owner process, a timing budget, and a failure mode that the artifact schema can represent explicitly.
Worked Example
A notebook policy may look stable simply because the camera stream, estimator, and model all run in one synchronous process. On the robot, the camera can start late, one frame can be dropped, or a model can reload during operation. A deployment artifact should make those conditions observable instead of collapsing them into one final score.
from dataclasses import dataclass, asdict
import json
@dataclass
class DeploymentManifest:
section: str
control_hz: int
policy_hz: int
max_staleness_ms: int
fallback_mode: str
rollback_trigger: str
readiness_checks: list[str]
def as_row(self) -> dict[str, object]:
return asdict(self)
manifest = DeploymentManifest(
section="55.1",
control_hz=100,
policy_hz=10,
max_staleness_ms=80,
fallback_mode="hold_last_safe_command_then_stop",
rollback_trigger="two consecutive watchdog failures or any emergency-stop event",
readiness_checks=[
"camera topic alive",
"extrinsics loaded",
"policy checksum verified",
"controller deadline miss rate < 0.1%",
],
)
print(json.dumps(manifest.as_row(), indent=2))
{
"section": "55.1",
"control_hz": 100,
"policy_hz": 10,
"max_staleness_ms": 80,
"fallback_mode": "hold_last_safe_command_then_stop",
"rollback_trigger": "two consecutive watchdog failures or any emergency-stop event",
"readiness_checks": [
"camera topic alive",
"extrinsics loaded",
"policy checksum verified",
"controller deadline miss rate < 0.1%"
]
}The expected output is a manifest with fields that an operator, an auditor, and a replay script can all consume directly. If the rollout report later claims success but cannot point back to explicit readiness checks, the system is still operating in notebook mode intellectually even if it is physically on a robot.
- Freeze the observation schema, frame conventions, and command units.
- Choose controller and policy rates, then compute the maximum admissible command staleness.
- Define readiness checks for sensors, calibration, model checksum, and watchdogs.
- Run nominal, cold-start, and missing-topic perturbations on one panel.
- Enable autonomy only if the same artifact contains the success metrics, timing histograms, and recovery transitions.
The hand-built manifest stays small on purpose. In production, ROS 2 lifecycle nodes, Launch files, MLflow or DVC, and signed artifact registries should preserve the same contract while adding versioning, rollout history, and retrieval of exact binaries and configs.
Practical Recipe
- Write one deployment manifest before touching launch files.
- Measure controller tick rate, policy tick rate, queue age, and topic freshness under nominal load.
- Inject at least one startup fault and one runtime fault.
- Record monitor transitions, operator interventions, and rollback decisions in the same artifact.
- Promote the service only if the artifact is enough for another team to reproduce both success and failure behavior.
Copying notebook code into a robot package without declaring timing and rollback semantics creates a silent mode switch: the system looks deployed, but no one can say which assumptions still hold.
A mobile manipulator team moving from MuJoCo to hardware often discovers that the grasp policy is not the first thing that fails. More common early failures are mismatched camera frames, stale extrinsics, actuator enable races, and control packets arriving after the grasp window. A good deployment artifact makes those systems faults legible instead of mislabeling them as learning failures.
Modern embodied AI stacks increasingly combine robot-native release engineering, shadow deployment, and formal runtime assurance. The open problem is how to preserve real-time guarantees while deploying larger multimodal policies and more adaptive behavior.
Can you state the policy rate, controller rate, maximum command staleness, readiness checks, and rollback trigger without opening another file? If not, the deployment contract is incomplete.
From notebook to robot becomes operational when the model is subordinated to a runtime contract. The contract should specify who owns each transition in the boot-to-ready-to-autonomous path, how stale actions are detected, and when the robot leaves autonomy without human debate.
The disciplined habit is to separate three claims. The conceptual claim says the policy should improve task performance. The systems claim says the policy can live inside the timing and safety envelope. The evidence claim says the resulting deployment bundle proves both.
| Tool or Library | Role in From notebook to robot |
|---|---|
| ROS 2 lifecycle nodes | Expose explicit boot, ready, degraded, and shutdown transitions. |
| Docker or Nix | Freeze runtime dependencies and support reproducible rollback. |
| MLflow or DVC | Bind the deployed manifest to its exact evaluation artifact. |
Cross-References
For From notebook to robot, connect benchmark design, sim-to-real transfer, uncertainty, and safety barriers through the deployment artifact that will be checked before release.
Create one deployment bundle for five hardware or simulator runs. Include the manifest, a timing trace, readiness-check outcomes, monitor transitions, and a short failure diagnosis. Then change one deployment setting such as queue depth or policy rate and verify that the two bundles can be compared field by field.
When the transition to hardware fails, assign the fault to startup sequencing, timing, frame inconsistency, estimator drift, stale command handling, operator procedure, or evaluation hygiene. Then rerun a perturbation that isolates exactly one of those mechanisms.
For From notebook to robot, schema strictness is cheaper than discovering a missing field during a moving-robot trial; require the log before comparing outcomes.
From notebook to robot is successful only when the learned component is wrapped in an explicit timing, safety, and rollback contract that another builder can audit end to end.
Design a deployment manifest for a small mobile manipulator. Specify controller and policy rates, freshness threshold, readiness checks, one perturbation, and one rollback rule. Then state which artifact fields would prove the manifest was respected during execution.
Section References
Quigley, M. et al. ROS: an open-source Robot Operating System. ICRA Workshop, 2009.
Use for the robotics middleware lineage behind nodes, topics, services, bags, and deployment boundaries.
OpenTelemetry project documentation. https://opentelemetry.io/docs/
Use for tracing, metrics, and logs when robot deployment evidence must connect software events to runtime behavior.
After From notebook to robot, the next section should reuse the artifact schema while changing one deployment interface or failure mode, so comparisons remain auditable.