Section 2.8: Why embodiment is usually partially observable

A Careful Control Loop
Big Picture

Why embodiment is usually partially observable explains why partial observability is the default physical condition rather than a corner case. Bodies occlude sensors, contacts reveal information only after action, actuators lag behind commands, and other agents have intentions the robot cannot read directly.

Concept map for Section 2.8 A local diagram showing how embodiment hides variables behind bodies, objects, time, and other agents. Evidence what the agent receives Decision what the system changes Consequence what the next step inherits Closed-loop feedback makes the next input depend on the last action.
Figure 2.8. Why embodiment is usually partially observable is easiest to reason about as a closed-loop evidence, decision, consequence pattern: embodiment hides variables behind bodies, objects, time, and other agents.

This section develops a physical explanation for why the agent-environment interface is rarely fully observable. In a simulator, full state may exist in memory. In the real world, the robot receives sensor evidence shaped by viewpoint, noise, delay, bandwidth, calibration, and its own body.

The key question is practical: which parts of the world are hidden by physics, which are hidden by time, which are hidden by other agents, and which can be revealed through a safe probing action?

Embodiment Hides And Reveals

The body is both a sensor and an obstacle. A gripper can reveal friction through force, but it can also hide the object from the camera at the exact moment contact matters.

Theory

Full observability would mean the current observation contains every variable needed for prediction and reward. Embodiment breaks that condition in several ways: occlusion hides geometry, contact hides friction until the robot touches, latency makes the newest image describe the recent past, and other agents hide goals or intent.

This does not make action impossible. It means the interface must support memory, uncertainty, and information-gathering actions. Sometimes the correct action is not "move toward the goal" but "move the camera," "touch gently," "wait one step," or "ask for clarification."

Mechanism

The mechanism is active perception. The agent chooses actions that both change the world and change what can be known about the world. This is why observations, action history, timestamps, and recovery events should be stored together.

Worked Example

Code Fragment 2.8.1 turns partial observability into a debugging checklist. Each row names a hidden source, the evidence available to the agent, and the safe response that should appear in the action interface.

# Section 2.8: map hidden physical variables to probes and recovery actions.
# Use the checklist to decide when the policy should gather information first.
hidden_sources = [
    {"source": "occlusion", "evidence": "last_seen_pose", "probe": "move_camera"},
    {"source": "contact_friction", "evidence": "force_trend", "probe": "guarded_touch"},
    {"source": "latency", "evidence": "timestamp_age_ms", "probe": "predict_forward"},
    {"source": "human_intent", "evidence": "motion_cue", "probe": "wait_and_observe"},
]

for item in hidden_sources:
    action_mode = "probe_first" if item["source"] in {"occlusion", "human_intent"} else "act_with_monitor"
    print(f"{item['source']}: {action_mode} using {item['evidence']}")
occlusion: probe_first using last_seen_pose contact_friction: act_with_monitor using force_trend latency: act_with_monitor using timestamp_age_ms human_intent: probe_first using motion_cue
Code Fragment 2.8.1 maps four physical sources of hidden state to evidence fields and action modes. The probe_first rows show where uncertainty should change behavior before the robot commits to the task action.

Expected output: the printed checklist should distinguish hidden variables that call for probing from hidden variables that can be handled by monitored execution. If every row uses the same action mode, the interface is not yet using uncertainty.

Library Shortcut

The from-scratch checklist is for understanding. In a practical system, ROS 2 diagnostics, sensor-fusion filters, world-model rollouts, and robot data tools can publish the same uncertainty fields at runtime. The shortcut removes logging boilerplate, but the system designer still must decide which hidden variables require probing, slowing, or stopping.

Practical Recipe

  1. List hidden variables by source: occlusion, contact, latency, calibration, internal hardware state, and other agents.
  2. For each hidden variable, name the available evidence and the confidence field to log.
  3. Add at least one information-gathering action for high-risk uncertainty.
  4. Test policies under sensor delay, occlusion, friction change, and human-motion ambiguity.
  5. Report recovery behavior separately from first-attempt success.
Common Failure Mode

The common mistake is to treat a clean perception snapshot as if it were the world. A robot can localize an object correctly and still fail because friction, timing, cable drag, or human motion was hidden from the current observation.

Practical Example

A delivery robot in a lobby may see a clear path, but not whether a person behind a pillar is about to step out. A deployment-ready interface logs last-seen positions, timestamp age, predicted motion, and the decision to slow or reobserve.

Fun Note

The world does not provide a debug console. It provides shadows, delays, and one suspicious noise behind the robot.

Research Frontier

The research frontier is moving from passive perception toward active perception and uncertainty-aware action. The hard problem is not only recognizing more objects, but deciding when the robot should change viewpoint, touch carefully, wait, or ask because the current observation is not enough.

Self Check

Can you name one hidden variable caused by occlusion, one caused by contact, one caused by time delay, and one caused by another agent?

Partial observability becomes useful when it is tied to a closed-loop contract between policy, world, evaluator, and safety constraints. The contract names the hidden-variable inventory, observation stream, belief or confidence field, probing action, timing budget, safety boundary, and result artifact. That is the bridge between a readable concept and a system a skeptical builder can test.

For Why embodiment is usually partially observable, separate the conceptual claim, the systems claim, and the evidence claim. A good explanation, a clean API, and one successful rollout are different kinds of evidence, and the section should keep them distinct.

Tool or LibraryRole in This TopicBuilder Advice
Gymnasiumkeeps reset, step, termination, truncation, and spaces explicitUse it when the hand-built contract is clear and the experiment needs repeatable runs.
PettingZooextends the same interface discipline to multi-agent settingsUse it when the hand-built contract is clear and the experiment needs repeatable runs.
ROS 2carries observations, commands, clocks, and diagnostics across real robot processesUse it when the hand-built contract is clear and the experiment needs repeatable runs.

For Why embodiment is usually partially observable, a robust implementation starts with one inspectable baseline whose artifact records observations, actions, units, timestamps, seeds, termination reasons, and the perturbation applied. The maintained-tool version is useful only if it preserves that schema and lets the comparison remain construct-matched.

  1. Write a one-paragraph task contract with observation, action, success, failure, and safety fields.
  2. Start with the smallest simulator, dataset, or wrapper that exposes the task contract faithfully.
  3. Run one deterministic smoke test and one perturbation test before scaling.
  4. Save one artifact containing configuration, seed, metrics, traces, and failure labels.
  5. Compare methods only when the same script evaluates the same panel, split, seed set, and metric.

When a partially observable embodied system fails, avoid labeling the whole method as weak. First assign the failure to occlusion, contact sensing, latency, calibration, internal hardware state, other-agent prediction, or evaluation. Then rerun one controlled perturbation that isolates the suspected cause. This pattern turns a disappointing rollout into a reusable diagnostic asset.

Key Takeaway

Embodiment is usually partially observable because bodies, time, contact, and other agents hide action-relevant variables. Robust agents treat uncertainty as part of the interface, not as a footnote after perception.

Exercise 2.8.1

Design a method-matched experiment for Why embodiment is usually partially observable. Specify the environment, observation schema, action interface, metric, and one perturbation that targets the section's core assumption.

What's Next?

Chapter 3 uses the interface to organize complete embodied system architectures.

Bibliography & Further Reading

Foundational References For This Section

Bellman, R.. "A Markovian Decision Process." (1957). https://doi.org/10.1515/9781400835386-007

The mathematical origin of the state, action, transition, and reward framing.

Kaelbling, L. P., Littman, M. L., and Cassandra, A. R.. "Planning and acting in partially observable stochastic domains." (1998). https://www.sciencedirect.com/science/article/pii/S000437029800023X

A foundational POMDP reference for belief-state reasoning under partial observability.

Farama Foundation. "Gymnasium Documentation." (2024). https://gymnasium.farama.org/

The maintained reference for reset, step, spaces, termination, truncation, wrappers, and reproducible environments.