Section 2.2: State, observation, hidden variables, partial observability | Building Embodied AI: From Perception to Autonomous Action

"My camera saw the block. My gripper discovered the friction. My log finally admitted both were incomplete."
A Probabilistic Robot Assistant

Technical illustration for Section 2.2: State, observation, hidden variables, partial observability. — Figure 2.2A: Partial observability illustrated as a spotlight: the true world state is a full room, but the agent sees only the lit region. Hidden variables (object positions behind walls) connect the unseen state to future observations.

Big Picture

State, observation, hidden variables, partial observability separates what exists in the world from what the agent senses. Embodied agents rarely receive complete state. They receive noisy, delayed, partial observations and must infer the variables that matter for action.

Figure 2.2. State vs. observation is easiest to reason about as a closed-loop evidence, decision, consequence pattern: state lives in the world while observation is the agent's limited evidence.

This section develops the vocabulary needed to debug almost every embodied AI system. A simulator may know the true pose, mass, friction, and contact state of every object. A robot policy may see only pixels, joint encoders, force readings, and a stale command queue. That difference is not a nuisance. It is the problem.

State is the information sufficient to predict future dynamics and reward when paired with an action. Observation is the sensor-facing evidence available to the agent. Hidden variables are state variables that matter but are not directly observed. Partial observability is the normal condition in which the latest observation is not enough.

Observation Is Not State

A camera frame can support useful action, but it is not the full physical situation. A deployable embodied system must decide which hidden variables to estimate, which to perturb in simulation, and which to reserve for evaluation diagnostics.

Theory

Let $s_t$ be the world state and $o_t$ be the observation emitted by sensors or an environment wrapper. In a fully observable task, $o_t$ contains enough information to act as $s_t$. In embodied AI, that is usually false. Occlusion hides objects, contact reveals only local forces, latency makes images stale, and human intent is not directly measurable.

The agent therefore maintains an estimate $\hat{s}_t$ or a belief over possible states. This estimate may be a Kalman filter state, a particle filter, a learned recurrent hidden state, a transformer memory, or a structured world-model latent. The name matters less than the contract: what uncertainty does it represent, how is it updated, and how does action use it?

Mechanism

Partial observability turns the loop into observe, update belief, choose action, execute, and revise. In simulation, privileged state can be logged for diagnostics while keeping the agent restricted to observations. This separation is essential for honest evaluation.

Worked Example

Code Fragment 2.2.1 updates a tiny belief state from a partial observation. The hidden variable is slip risk, which the camera cannot directly see.

# Section 2.2: runnable checkpoint for State vs. observation.
# Keep the output small so the evidence record can be inspected directly.
belief = {"block_x": 0.50, "block_visible": True, "slip_risk": 0.20}
observation = {"detected_x": 0.54, "visible": True, "force_spike": False}

if observation["visible"]:
    belief["block_x"] = 0.7 * belief["block_x"] + 0.3 * observation["detected_x"]
else:
    belief["block_visible"] = False

if observation["force_spike"]:
    belief["slip_risk"] += 0.25
else:
    belief["slip_risk"] *= 0.95

print({"estimated_x": round(belief["block_x"], 3), "slip_risk": round(belief["slip_risk"], 3)})

Code Fragment 2.2.1 updates an estimated block position and hidden slip risk from vision and force observations.

Expected output: an updated position estimate and slip-risk estimate. The example should make clear that vision updates visible pose, while contact evidence updates a hidden physical variable.

Library Shortcut

The 15-line belief update becomes a few estimator or logging components in a real stack. MuJoCo and Isaac Lab can expose privileged simulator state for diagnostics, ROS 2 can publish state estimates as topics, and LeRobot can store synchronized observations and actions for later analysis. The hand-built version is still useful because it states exactly which hidden variable is being tracked.

Practical Recipe

List variables needed to predict dynamics, not just variables found in the sensor packet.
Mark each variable as observed, estimated, delayed, hidden, or evaluator-only.
Add uncertainty to every estimated variable and log that uncertainty.
Carry history when the current observation is insufficient.
Evaluate under occlusion, sensor delay, calibration drift, and contact changes.

Failure Mode

Treating observation as state creates brittle policies. A camera frame may show the gripper and block, but not friction, object mass, motor temperature, cable drag, or a person about to enter the workspace.

Practical Example

A manipulation lab trained a policy on visible object pose and saw strong simulation success. On the real robot, the gripper briefly occluded the object before contact, and the policy moved as if the last visible pose were still certain. Adding a belief state with last-seen pose, elapsed time, and confidence let the controller slow down and reobserve.

Memorable Shortcut

If the agent says it knows the whole state from one RGB image, ask it where the object mass is hiding. The answer is usually "in the failure case."

Research Frontier

Robot foundation models and world models increasingly learn latent state from history rather than a single frame. The open challenge is making learned latent state useful to safety monitors, dashboards, and closed-loop evaluators instead of leaving it as an opaque activation vector.

Mini Lab

Modify Code Fragment 2.2.1 so the object is invisible for three time steps. Log how confidence changes, then decide when the robot should stop and reobserve.

Self Check

For a tabletop robot, can you name three variables that affect future action but are not directly visible in the latest camera image?

State-observation discipline prevents privileged information leakage. In simulation the evaluator may know object mass, contact normal, true pose, and collision margin. The policy should receive only the observation channels that a deployed system can provide. A result that mixes those views may measure access to simulator internals rather than intelligence.

The practical artifact is a variable ledger. Each variable is marked as observed, estimated, delayed, hidden, or evaluator-only. The ledger should also name the sensor or estimator that produces it and the uncertainty attached to it.

Tool or Library	Role in This Topic	Builder Advice
Kalman and particle filters	maintain explicit belief over hidden or noisy state variables	Use them when uncertainty is low-dimensional enough to model and audit directly.
Factor graph libraries	combine measurements, priors, and constraints into a structured state estimate	Use them for localization, mapping, calibration, and multi-sensor fusion problems.
MuJoCo, Isaac Lab, and ROS 2 logs	separate privileged simulator state, policy observations, and deployed state-estimate topics	Use them to prove that evaluation state did not leak into policy input.

Build a leak test before training. The test should compare evaluator state fields with policy observation fields and fail when evaluator-only variables appear in policy input.

List all variables needed to predict dynamics and reward.
Mark each variable as observed, estimated, delayed, hidden, or evaluator-only.
Attach units, uncertainty, and source sensor or estimator.
Check that evaluator-only variables are absent from policy input.
Stress the belief with occlusion, delay, calibration drift, and contact changes.

# Build a variable ledger and detect privileged-state leakage.
variables = {
    "true_pose": "evaluator_only",
    "rgb_crop": "observed",
    "last_seen_pose": "estimated",
    "slip_risk": "hidden",
    "contact_force": "observed",
}
policy_input = {"rgb_crop", "last_seen_pose", "true_pose"}

def privileged_leaks(variables: dict[str, str], policy_input: set[str]) -> list[str]:
    return sorted(
        name for name in policy_input
        if variables.get(name) == "evaluator_only"
    )

print(privileged_leaks(variables, policy_input))

Code Fragment 2.2.2 detects evaluator-only state that has leaked into the policy observation.

When behavior fails under partial observability, classify whether the agent lacked a sensor, lacked history, carried stale belief, underestimated uncertainty, or received leaked training information. Each cause points to a different repair.

Key Takeaway

State is what would make prediction complete. Observation is what the agent receives. Embodied intelligence lives in the gap between them.

Exercise 2.2.1

For a mobile robot in a hallway, classify map location, battery health, pedestrian intent, wheel slip, and camera image as state, observation, hidden variable, or estimate.

What's Next?

Section 2.3 turns observations into action representations at several levels of abstraction.

Bibliography & Further Reading

Foundational References For This Section

Bellman, R.. "A Markovian Decision Process." (1957). https://doi.org/10.1515/9781400835386-007

The mathematical origin of the state, action, transition, and reward framing.

Kaelbling, L. P., Littman, M. L., and Cassandra, A. R.. "Planning and acting in partially observable stochastic domains." (1998). https://www.sciencedirect.com/science/article/pii/S000437029800023X

A foundational POMDP reference for belief-state reasoning under partial observability.

Farama Foundation. "Gymnasium Documentation." (2024). https://gymnasium.farama.org/

The maintained reference for reset, step, spaces, termination, truncation, wrappers, and reproducible environments.