Section 55.1: From notebook to robot | Building Embodied AI: From Perception to Autonomous Action

For From notebook to robot, deployment quality is measured by the command stream, safety monitor state, and replayable evidence behind each command.
A Careful Control Loop

Technical illustration for Section 55.1: From notebook to robot. — Figure 55.1A: The notebook-to-robot deployment pipeline: a trained checkpoint is exported to TorchScript or ONNX, a hardware abstraction layer maps the model's action output to joint commands, and an integration test verifies round-trip latency before the first physical run.

Big Picture

From notebook to robot matters because deployment architecture turns an experiment into a timed, observable, recoverable system. The section treats evaluation, uncertainty, safety, and deployment as one closed-loop contract rather than as separate checklist items.

Problem First

A notebook hides process boundaries, start-up races, stale topics, missing calibration, device contention, and undeclared operator assumptions. The robot experiences all of them concurrently, so deployment is a systems-identification step as much as a packaging step.

The practical question is therefore sharper than "does the model work?" It is: which sensors and clocks define the current state estimate, which action interface is authoritative, which monitor may override it, and which artifact proves that the resulting behavior is acceptable under perturbation?

Enable Autonomy Only After The Contract Exists

Before motion is enabled, name the observation schema, state estimator owner, action rate, timeout, fallback state, and rollback condition. Without those fields, a successful demo is not yet a deployable result.

The evidence contract for From notebook to robot keeps the observation, estimate, action, monitor decision, and result artifact in one traceable path.

Theory

Deployment begins by splitting the system into a high-rate actuation loop, a perception and estimation path, a policy service, and a supervisory layer. The minimum timing constraint is

$$\tau_{\mathrm{sense}} + \tau_{\mathrm{queue}} + \tau_{\mathrm{infer}} + \tau_{\mathrm{publish}} \le T_{\mathrm{policy}}, \quad T_{\mathrm{policy}} \le k T_{\mathrm{ctrl}},$$

where $T_{\mathrm{ctrl}}$ is the low-level controller period and $k$ is the number of controller ticks for which a policy action may remain valid. Once that inequality is violated, the deployment problem is no longer "model quality" but stale-command control.

A useful evidence record is $e_i=(x_i,\hat s_i,a_i,m_i,\ell_i,z_i)$, where $x_i$ is the scenario context, $\hat s_i$ is the estimator state, $a_i$ is the issued command, $m_i$ is the monitor transition, $\ell_i$ is the latency vector, and $z_i$ is the artifact id. Every deployment claim in this chapter should be recoverable from one set of $e_i$ records.

Mechanism

The mechanism is startup, sense, estimate, decide, constrain, execute, observe, and recover. Each verb must have an owner process, a timing budget, and a failure mode that the artifact schema can represent explicitly.

Figure 55.1B: Deployment should expose an explicit runtime state machine. The key operational difference between a notebook and a robot is that the state transitions become safety-relevant.

Worked Example

A notebook policy may look stable simply because the camera stream, estimator, and model all run in one synchronous process. On the robot, the camera can start late, one frame can be dropped, or a model can reload during operation. A deployment artifact should make those conditions observable instead of collapsing them into one final score.

from dataclasses import dataclass, asdict
import json

@dataclass
class DeploymentManifest:
    section: str
    control_hz: int
    policy_hz: int
    max_staleness_ms: int
    fallback_mode: str
    rollback_trigger: str
    readiness_checks: list[str]

    def as_row(self) -> dict[str, object]:
        return asdict(self)

manifest = DeploymentManifest(
    section="55.1",
    control_hz=100,
    policy_hz=10,
    max_staleness_ms=80,
    fallback_mode="hold_last_safe_command_then_stop",
    rollback_trigger="two consecutive watchdog failures or any emergency-stop event",
    readiness_checks=[
        "camera topic alive",
        "extrinsics loaded",
        "policy checksum verified",
        "controller deadline miss rate < 0.1%",
    ],
)

print(json.dumps(manifest.as_row(), indent=2))

{
  "section": "55.1",
  "control_hz": 100,
  "policy_hz": 10,
  "max_staleness_ms": 80,
  "fallback_mode": "hold_last_safe_command_then_stop",
  "rollback_trigger": "two consecutive watchdog failures or any emergency-stop event",
  "readiness_checks": [
    "camera topic alive",
    "extrinsics loaded",
    "policy checksum verified",
    "controller deadline miss rate < 0.1%"
  ]
}

Code Fragment 55.1.1 defines the minimum manifest that should exist before autonomy is enabled on hardware.

The expected output is a manifest with fields that an operator, an auditor, and a replay script can all consume directly. If the rollout report later claims success but cannot point back to explicit readiness checks, the system is still operating in notebook mode intellectually even if it is physically on a robot.

Algorithm: Promote an Experiment to a Robot Service

Freeze the observation schema, frame conventions, and command units.
Choose controller and policy rates, then compute the maximum admissible command staleness.
Define readiness checks for sensors, calibration, model checksum, and watchdogs.
Run nominal, cold-start, and missing-topic perturbations on one panel.
Enable autonomy only if the same artifact contains the success metrics, timing histograms, and recovery transitions.

Library Shortcut

The hand-built manifest stays small on purpose. In production, ROS 2 lifecycle nodes, Launch files, MLflow or DVC, and signed artifact registries should preserve the same contract while adding versioning, rollout history, and retrieval of exact binaries and configs.

Practical Recipe

Write one deployment manifest before touching launch files.
Measure controller tick rate, policy tick rate, queue age, and topic freshness under nominal load.
Inject at least one startup fault and one runtime fault.
Record monitor transitions, operator interventions, and rollback decisions in the same artifact.
Promote the service only if the artifact is enough for another team to reproduce both success and failure behavior.

Common Failure Mode

Copying notebook code into a robot package without declaring timing and rollback semantics creates a silent mode switch: the system looks deployed, but no one can say which assumptions still hold.

Practical Example

A mobile manipulator team moving from MuJoCo to hardware often discovers that the grasp policy is not the first thing that fails. More common early failures are mismatched camera frames, stale extrinsics, actuator enable races, and control packets arriving after the grasp window. A good deployment artifact makes those systems faults legible instead of mislabeling them as learning failures.

Research Frontier

Modern embodied AI stacks increasingly combine robot-native release engineering, shadow deployment, and formal runtime assurance. The open problem is how to preserve real-time guarantees while deploying larger multimodal policies and more adaptive behavior.

Self Check

Can you state the policy rate, controller rate, maximum command staleness, readiness checks, and rollback trigger without opening another file? If not, the deployment contract is incomplete.

From notebook to robot becomes operational when the model is subordinated to a runtime contract. The contract should specify who owns each transition in the boot-to-ready-to-autonomous path, how stale actions are detected, and when the robot leaves autonomy without human debate.

The disciplined habit is to separate three claims. The conceptual claim says the policy should improve task performance. The systems claim says the policy can live inside the timing and safety envelope. The evidence claim says the resulting deployment bundle proves both.

Practical Tool Choices For This Section

Tool or Library	Role in From notebook to robot
ROS 2 lifecycle nodes	Expose explicit boot, ready, degraded, and shutdown transitions.
Docker or Nix	Freeze runtime dependencies and support reproducible rollback.
MLflow or DVC	Bind the deployed manifest to its exact evaluation artifact.

Cross-References

For From notebook to robot, connect benchmark design, sim-to-real transfer, uncertainty, and safety barriers through the deployment artifact that will be checked before release.

Lab: Build The Artifact First

Create one deployment bundle for five hardware or simulator runs. Include the manifest, a timing trace, readiness-check outcomes, monitor transitions, and a short failure diagnosis. Then change one deployment setting such as queue depth or policy rate and verify that the two bundles can be compared field by field.

When the transition to hardware fails, assign the fault to startup sequencing, timing, frame inconsistency, estimator drift, stale command handling, operator procedure, or evaluation hygiene. Then rerun a perturbation that isolates exactly one of those mechanisms.

A Useful Annoyance

For From notebook to robot, schema strictness is cheaper than discovering a missing field during a moving-robot trial; require the log before comparing outcomes.

Key Takeaway

From notebook to robot is successful only when the learned component is wrapped in an explicit timing, safety, and rollback contract that another builder can audit end to end.

Exercise 55.1.1

Design a deployment manifest for a small mobile manipulator. Specify controller and policy rates, freshness threshold, readiness checks, one perturbation, and one rollback rule. Then state which artifact fields would prove the manifest was respected during execution.

Section References

Quigley, M. et al. ROS: an open-source Robot Operating System. ICRA Workshop, 2009.

Use for the robotics middleware lineage behind nodes, topics, services, bags, and deployment boundaries.

OpenTelemetry project documentation. https://opentelemetry.io/docs/

Use for tracing, metrics, and logs when robot deployment evidence must connect software events to runtime behavior.

What's Next

After From notebook to robot, the next section should reuse the artifact schema while changing one deployment interface or failure mode, so comparisons remain auditable.