Section 55.2: Real-time inference and control rates | Building Embodied AI: From Perception to Autonomous Action

For Real-time inference and control rates, deployment quality is measured by the command stream, safety monitor state, and replayable evidence behind each command.
A Careful Control Loop

Technical illustration for Section 55.2: Real-time inference and control rates. — Figure 55.2A: Control-rate budget breakdown for a manipulation system: perception (30 ms), inference (15 ms), planning (5 ms), and actuation (1 ms) are shown on a timeline, and the diagram marks where a 50 Hz control loop can and cannot fit.

Big Picture

Real-time inference and control rates matters because control rates define how much latency the policy can spend. The section treats evaluation, uncertainty, safety, and deployment as one closed-loop contract rather than as separate checklist items.

Problem First

A strong model can be unusable if inference arrives after the controller needed the decision.

The practical question is therefore specific: which observation arrives, which state estimate is trusted, which action is allowed, which monitor can interrupt it, and which artifact proves the claim afterward?

Same-Artifact Rule

Every compared number in this section should be co-computed by one script on one task panel, with one seed plan and one saved artifact. That artifact carries success, failure, latency, safety, and robustness fields together.

The evidence contract for Real-time inference and control rates keeps the observation, estimate, action, monitor decision, and result artifact in one traceable path.

Theory

Real-time deployment is a multirate systems problem. A stabilizing controller may run at 200 to 1000 Hz, a state estimator at 30 to 200 Hz, a learned visuomotor policy at 5 to 30 Hz, and a task planner at 0.5 to 2 Hz. These loops must exchange commands without violating freshness constraints.

A useful contract is

$$\Pr(\tau_{\mathrm{age}} > \tau_{\max}) \le \epsilon,\qquad \tau_{\mathrm{age}}=\tau_{\mathrm{sense}}+\tau_{\mathrm{queue}}+\tau_{\mathrm{infer}}+\tau_{\mathrm{publish}}.$$

Average latency is not enough. Physical instability is typically triggered by the tail of the latency distribution, so p95 and p99 command age belong in the same artifact as task success.

Mechanism

The mechanism is observe, estimate, choose, constrain, execute, monitor, log, and review. Each verb has an owner in the deployment architecture and a field in the evaluation artifact.

Worked Example

A legged robot often stabilizes at high rate while a vision module and a policy run much more slowly. The safe pattern is to hold the low-level loop constant and explicitly decide when a slower policy output is still fresh enough to consume.

from statistics import quantiles

control_period_ms = 5
command_ages_ms = [28, 32, 35, 31, 40, 29, 36, 34, 52, 33]
max_freshness_ms = 40

p95 = quantiles(command_ages_ms, n=20)[18]
deadline_miss_rate = sum(age > max_freshness_ms for age in command_ages_ms) / len(command_ages_ms)
degraded_mode = p95 > max_freshness_ms

report = {
    "section": "55.2",
    "control_period_ms": control_period_ms,
    "policy_age_p95_ms": round(p95, 1),
    "deadline_miss_rate": deadline_miss_rate,
    "degraded_mode": degraded_mode,
}
print(report)

{'section': '55.2', 'control_period_ms': 5, 'policy_age_p95_ms': 57.4, 'deadline_miss_rate': 0.1, 'degraded_mode': True}

Code Fragment 55.2.1 computes a simple freshness report for a slower policy feeding a faster controller.

The expected output should make the control decision obvious. Here the p95 command age exceeds the freshness threshold, so the correct interpretation is not merely that latency is "a bit high" but that the stack should enter a degraded mode or reduce reliance on the slow policy.

Algorithm: Multirate Control Integration

Freeze the high-rate stabilizer and measure its independent stability margin.
Choose the learned policy rate and a maximum command age budget.
Buffer, timestamp, and reject stale commands explicitly.
Log p50, p95, p99 latency and deadline misses under nominal and stressed compute load.
Switch to degraded mode whenever the freshness contract is violated repeatedly.

Library Shortcut

The hand-built record is about 24 lines. In a production run, DVC, MLflow, Weights and Biases Artifacts, or a ROS 2 bag plus metadata file reduces the tracking code to a few calls while handling versioning, file storage, run ids, and reproducible retrieval. The hand-built version remains useful because it shows which fields the tool must preserve.

Practical Recipe

Write the observation, action, monitor, metric, and artifact fields before selecting a model.
Run a deterministic smoke test and one named perturbation from the panel.
Log success, safety events, latency, energy or resource use, and recovery status in the same row group.
Compare only methods evaluated by the same script on the same panel and seed plan.
Attach a short postmortem to each failed rollout so the artifact remains useful after the plot is forgotten.

Common Failure Mode

Teams often optimize mean inference time while ignoring queue buildup. The robot then fails when a burst of sensor callbacks or one GPU stall pushes command age past the control horizon.

Practical Example

An embodied AI team applying Real-time inference and control rates should review a single run folder containing configuration, model version, rollout traces, monitor transitions, video or sensor replay, and the metric table. The review asks whether the evidence supports the deployment decision, not whether one isolated number looks good.

Research Frontier

Deployment work increasingly studies asynchronous policies, action chunking, and speculative planning to fit large models inside real-time control loops.

Self Check

Can you name the metric contract, perturbation panel, monitor state, and artifact id for Real-time inference and control rates? If any field is missing, the claim is not yet audit-ready.

Real-time inference and control rates becomes operational when the metric is tied to a runtime interface. The interface names the sensor stream, state estimate, action representation, timing budget, safety or robustness monitor, and deployment artifact.

The disciplined habit is to separate three claims. The conceptual claim explains why the method should help. The systems claim explains which interface it changes. The evidence claim records which measurement would convince a skeptical builder.

Practical Tool Choices For This Section

Tool or Library	Role in Real-time inference and control rates
ROS 2 executors	Coordinate callback timing and process boundaries.
TensorRT	Optimizes inference latency on edge hardware.
OpenTelemetry	Traces inference, planning, and controller timing across processes.

Cross-References

For Real-time inference and control rates, connect benchmark design, sim-to-real transfer, uncertainty, and safety barriers through the deployment artifact that will be checked before release.

Lab: Build The Artifact First

Create a JSON or Parquet artifact for five rollouts of Real-time inference and control rates. Include fields for configuration, seed, perturbation, metric values, monitor state, and a short failure label. Then rerun the same panel with one changed policy setting and verify that both methods can be compared row by row.

When a rate contract fails, classify the fault as sensor backlog, estimator lag, inference stall, middleware queue growth, scheduler preemption, or controller integration error. Then isolate that mechanism with one targeted perturbation such as synthetic GPU contention or callback bursts.

A Useful Annoyance

For Real-time inference and control rates, schema strictness is cheaper than discovering a missing field during a moving-robot trial; require the log before comparing outcomes.

Key Takeaway

Real-time inference and control rates is valuable when it changes the closed-loop decision and leaves behind evidence that another builder can audit.

Exercise 55.2.1

Design a same-artifact evaluation for this section. Specify the environment, rollout panel, seed plan, metric fields, monitor fields, one perturbation, and one rollback or recovery rule.

Section References

Quigley, M. et al. ROS: an open-source Robot Operating System. ICRA Workshop, 2009.

Use for the robotics middleware lineage behind nodes, topics, services, bags, and deployment boundaries.

OpenTelemetry project documentation. https://opentelemetry.io/docs/

Use for tracing, metrics, and logs when robot deployment evidence must connect software events to runtime behavior.

What's Next

After Real-time inference and control rates, the next section should reuse the artifact schema while changing one deployment interface or failure mode, so comparisons remain auditable.