Section 3.6: Dual-system (System 1 / System 2) designs and where they come from | Building Embodied AI: From Perception to Autonomous Action

A Careful Control Loop

Big Picture

Dual-system (System 1 / System 2) designs and where they come from is one lens on embodied system architectures. We study it because an embodied agent needs decisions that survive contact with noisy sensors, delayed effects, and changing environments.

Figure 3.6. Dual-system (System 1 / System 2) designs and where they come from is easiest to reason about as a closed-loop evidence, decision, consequence pattern: dual systems route routine cases fast and uncertain cases through deliberation.

This section develops the technical contract for dual-system (system 1 / system 2) designs and where they come from into a usable mental model. First we define the object of study, then we connect it to the agent loop, then we test it with a compact implementation.

The key question in Dual-system (System 1 / System 2) designs and where they come from is practical: what must the agent know, what can it observe, what action is available, and what evidence shows that the action worked under the stated conditions?

Action Is The Test

A representation earns its place when it changes the measurable action interface. In dual-system (system 1 / system 2) designs and where they come from, the reader should keep asking which decision becomes easier, safer, or more reliable.

Theory

For Dual-system (System 1 / System 2) designs and where they come from, the practical design rule is to make the interface inspectable before optimization begins: inputs, outputs, units, latency, bounds, and failure labels should all be visible in the saved artifact.

Dual-system designs borrow a useful distinction from cognitive science, but in embodied AI the distinction must become an engineering contract. "System 1" means a fast, learned, habitual, or reflexive path that can act under tight latency. "System 2" means a slower path that spends extra computation on planning, checking, search, tool use, or explanation before action.

The routing rule is the core design choice. A simple version is:

$$\text{route}(o_t)= \begin{cases} \text{System 1}, & u(o_t) < \tau \text{ and } r(o_t) < \rho \\ \text{System 2}, & \text{otherwise} \end{cases}$$

Here $u(o_t)$ is uncertainty, $r(o_t)$ is estimated risk, and $\tau,\rho$ are deployment thresholds. The assumption is that uncertainty and risk are measurable enough to decide when fast action is safe. The failure mode is false confidence: the fast path acts on a case that should have been escalated, or the slow path consumes time on a routine case until the physical opportunity disappears.

Mechanism

The mechanism in Dual-system (System 1 / System 2) designs and where they come from is the contract between representation and action. Name what enters the module, what leaves it, which assumptions make that transformation valid, and which log would reveal a bad handoff.

Worked Example

The dual-system design lives or dies by its router, so the example makes the routing rule from the equation above executable and then measures it. System 1 is a fast reflex with a known competence region; System 2 is a slow path that is reliable but expensive. The router escalates whenever uncertainty $u$ or risk $r$ crosses a threshold, and we score the policy on latency and safety together.

import numpy as np
rng = np.random.default_rng(0)

TAU, RHO = 0.85, 0.85        # uncertainty / risk escalation thresholds
LAT1, LAT2 = 10, 250         # path latencies in ms

def system1(u):              # fast path: reliable only when confident
    return rng.random() > (0.03 + 0.15 * u) # failure rises with uncertainty
def system2(_u):             # slow path: reliable but expensive
    return rng.random() > 0.02

def route(u, r):
    return "S1" if (u < TAU and r < RHO) else "S2"

lat, fails = [], 0
for _ in range(2000):
    u, r = rng.random(), rng.random()
    if route(u, r) == "S1":
        ok, lat_ms = system1(u), LAT1
    else:
        ok, lat_ms = system2(u), LAT2
    lat.append(lat_ms)
    fails += (not ok)

print(f"mean_latency={np.mean(lat):6.1f} ms   "
      f"failure_rate={fails/2000:.3f}")
# Compare against always-S1 and always-S2 baselines by setting
# TAU=RHO=1.0 (always fast) or TAU=RHO=0.0 (always slow).

Code Fragment 3.6.1 implements the System 1 / System 2 router as a thresholded rule and scores it on latency and failure rate jointly. A good router improves both axes relative to either single-path baseline.

Expected output: the routed policy keeps mean latency far below the always-System-2 cost while keeping the failure rate far below the always-System-1 cost, because it spends the expensive path only on uncertain or risky cases. Set TAU=RHO=1.0 to force everything onto System 1 (fast but more failures) or 0.0 to force System 2 (safe but slow); the routed numbers should dominate the relevant axis of each. The near-threshold cases are where calibration matters, which is why the router, not either subsystem, is the first thing to diagnose.

Library Shortcut

For Dual-system (System 1 / System 2) designs and where they come from, the hand-built fragment is a visibility tool. Production work should move to maintained stacks such as Hugging Face Transformers, open VLMs, OpenVLA, openpi, LeRobot, and tool-calling planners once the section has made the interface, logging contract, and failure recovery path explicit.

Practical Recipe

Write the observation, action, and success metric before choosing a model.
Build a baseline that is simple enough to debug by inspection.
Add the library implementation only after the baseline behavior is understood.
Record failures as structured cases: perception error, state error, planning error, control error, or evaluation error.
Run at least one perturbation test before trusting the result.

Common Failure Mode

The common mistake in Dual-system (System 1 / System 2) designs and where they come from is to celebrate the component score before checking the closed-loop handoff. The failure usually appears at the boundary: stale state, wrong frame, delayed action, saturated actuator, or metric that ignores the real task cost.

Practical Example

A robotics team using dual-system (system 1 / system 2) designs and where they come from should log not only final success, but intermediate observations, chosen actions, controller status, and recovery events. The logs reveal whether the method is solving the task or merely passing the easiest episodes.

Fun Note

System 1 grabs the cup. System 2 asks whether it was supposed to be the blue cup after all.

Research Frontier

For Dual-system (System 1 / System 2) designs and where they come from, treat frontier claims as hypotheses until they expose enough detail to reproduce the result: data boundary, embodiment, controller interface, evaluation panel, and failure cases.

Self Check

Can you name the observation, state estimate, action, success metric, and most likely failure mode for dual-system (system 1 / system 2) designs and where they come from? If not, the system boundary is still too vague.

Dual-system (System 1 / System 2) designs and where they come from becomes useful when it is tied to a closed-loop contract for how perception, estimation, planning, learning, and control are arranged into a system. The contract names the observation stream, the action representation, the timing budget, the safety boundary, and the result artifact. That is the bridge between a readable concept and a system a skeptical builder can test.

For Dual-system (System 1 / System 2) designs and where they come from, separate the conceptual claim, the systems claim, and the evidence claim. A good explanation, a clean API, and one successful rollout are different kinds of evidence, and the section should keep them distinct.

Tool or Library	Role in This Topic	Builder Advice
ROS 2	separates system modules while preserving message contracts and timing	Use it when the hand-built contract is clear and the experiment needs repeatable runs.
MuJoCo	gives architecture choices a repeatable simulated world for stress tests	Use it when the hand-built contract is clear and the experiment needs repeatable runs.
LeRobot	anchors modern policy architectures in reusable datasets and policy APIs	Use it when the hand-built contract is clear and the experiment needs repeatable runs.

For Dual-system (System 1 / System 2) designs and where they come from, a robust implementation starts with one inspectable baseline whose artifact records observations, actions, units, timestamps, seeds, termination reasons, and the perturbation applied. The maintained-tool version is useful only if it preserves that schema and lets the comparison remain construct-matched.

Write a one-paragraph task contract with observation, action, success, failure, and safety fields.
Start with the smallest simulator, dataset, or wrapper that exposes the task contract faithfully.
Run one deterministic smoke test and one perturbation test before scaling.
Save one artifact containing configuration, seed, metrics, traces, and failure labels.
Compare methods only when the same script evaluates the same panel, split, seed set, and metric.

When Dual-system (System 1 / System 2) designs and where they come from fails, avoid labeling the whole method as weak. First assign the failure to perception, state estimation, planning, control, timing, data coverage, or evaluation. Then rerun one controlled perturbation that isolates the suspected cause. This pattern turns a disappointing rollout into a reusable diagnostic asset.

For dual-system designs, diagnose the router before blaming either subsystem. Log the uncertainty score, risk score, selected path, deliberation time, and action taken for every episode. Then examine near-threshold cases: these are the examples that reveal whether the handoff policy is calibrated. A useful router improves both safety and latency, not only one of them.

Key Takeaway

Dual-system (System 1 / System 2) designs and where they come from is useful when it makes the perception-action loop more reliable, not when it merely adds a more impressive model name.

Exercise 3.6.1

Design a method-matched experiment for Dual-system (System 1 / System 2) designs and where they come from. Specify the environment, observation schema, action interface, metric, and one perturbation that targets the section's core assumption.

What's Next?

Section 3.7 places LLMs, VLMs, and VLAs inside this architectural stack.

Bibliography & Further Reading

Foundational References For This Section

Quigley, M. et al.. "ROS: an open-source Robot Operating System." (2009). https://www.ros.org/

The systems reference for modular robot software and message-passing architecture.

Todorov, E., Erez, T., and Tassa, Y.. "MuJoCo: A physics engine for model-based control." (2012). https://mujoco.org/

A widely used simulator for architecture and control experiments.

Brohan, A. et al.. "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." (2023). https://arxiv.org/abs/2307.15818

A central reference for locating VLM and VLA models in embodied control stacks.