Section 29.1: Where am I and what does the world look like | Building Embodied AI: From Perception to Autonomous Action

"A pose estimate is a contract between memory, sensors, and the next motion command."
A Loop Closure That Came Back With Receipts

Educational illustration for Section 29.1, showing where am i and what does the world look like as a robot reasoning problem that connects measurements, state estimates, decisions, and replayable evidence. — **Figure 29.1.1**: Where am I and what does the world look like becomes useful when the visual idea is tied to a state variable, an uncertainty model, and the next robot action.

Big Picture

Where am I and what does the world look like is the state-estimation half of embodied autonomy. The robot has partial, noisy, time-stamped evidence; it must turn that evidence into a pose, a map, and an uncertainty statement that a planner can actually trust.

Problem First

A robot cannot make a reliable plan if pose and map are treated as separate chores. The map is built from poses, while pose estimates depend on the map. This circular dependency is why SLAM is an estimation problem rather than a drawing problem.

The state usually contains a trajectory $x_{0:T}$ and map variables $m$. Controls $u_{1:T}$ predict motion, while observations $z_{1:T}$ correct that prediction. A localization-only system estimates $x_t$ against a known map; a mapping-only system assumes poses are good enough; SLAM estimates both and exposes the remaining uncertainty.

Action Contract

A localization claim in this section is incomplete without frame, timestamp, pose covariance, map version, and the planner or controller that consumed it. Otherwise the result is a drawing rather than an interface.

Formal Model

The estimator in this opening section is a belief interface: pose, map, timestamp, covariance, and downstream consumer must be carried together. The posterior is useful only if the planner can ask which part of the world model is fresh enough to act on.

$$ p(x_{0:T},m\mid z_{1:T},u_{1:T}) \propto p(x_0)\prod_t p(x_t\mid x_{t-1},u_t)\prod_t p(z_t\mid x_t,m) $$

The notation matters because it names evidence sources: odometry, inertial cues, scans, visual landmarks, and loop closures. The engineering move is to keep a residual and uncertainty for each source so a bad route can be traced to the measurement that moved the belief.

Algorithm: Section 29.1 Evidence Loop

Define the state variables: pose, velocity if needed, landmarks or grid cells, and map frame.
Attach every measurement to a frame, timestamp, residual, and covariance.
Update the belief, then publish only the estimate fields the planner is allowed to consume.
Replay the same evidence after perturbing one sensor or transform assumption.

Worked Diagnostic

Code Fragment 1 should be read as a pose-belief smoke test. Before ROS nodes and graph optimizers enter the picture, the reader should see how one observation changes an estimate and whether the uncertainty moves in the expected direction.

# Compare a pose prior with one landmark observation.
# The innovation shows whether the new range evidence agrees with the map.
import numpy as np

prior_xy = np.array([2.0, 1.0])
landmark_xy = np.array([5.0, 1.0])
measured_range_m = 2.7
predicted_range_m = np.linalg.norm(landmark_xy - prior_xy)
innovation_m = measured_range_m - predicted_range_m
print(f"predicted={predicted_range_m:.2f} m")
print(f"innovation={innovation_m:.2f} m")

predicted=3.00 m innovation=-0.30 m

Expected output interpretation. The printed residual shows a 30 cm disagreement in the corrective direction: the sensor says the landmark is nearer than the prior predicts. In a filter or factor graph, this measurement would contribute a correction term that pulls the pose estimate, the landmark estimate, or both toward a shorter range.

Code Fragment 1: The diagnostic computes a range residual from one landmark and one pose prior. The negative innovation says the observation is closer than the current map-pose prediction, which should pull the pose or landmark estimate during correction.

Tool Workflow

Library Shortcut

In production, ROS 2 tf2 handles frame transforms, Nav2 consumes map and localization topics, and GTSAM or Ceres solves larger nonlinear least-squares problems. That is the reduction from hand-checking residuals to configuring maintained graph and middleware components.

Use the hand calculation to expose the belief update, then let ROS 2, GTSAM, Cartographer, or Nav2 handle large logs and maps. The hand check remains the regression test for units, frames, and uncertainty.

Failure Mode To Test

Replay one short route with delayed transforms, missing scan packets, and a wrong initial pose as separate perturbations. The point is to learn whether failure begins as sensing ambiguity, association error, optimization drift, or planner misuse of a stale map.

Practical Example

For a warehouse robot, the minimum replay bundle is odometry, IMU, scan stream, pose estimate, covariance, active map layer, local costmap, and recovery action. That bundle separates being lost from being correctly localized in a changed aisle.

Research Frontier

Current SLAM research increasingly connects geometry with semantics, neural scene representations, and open-vocabulary perception. The open problem is not only dense reconstruction; it is making dense maps reliable enough for planning, manipulation, and safety cases.

Memory Hook

A pose estimate is a contract between memory, sensors, and the next motion command.

Self Check

Can you state the state variables, observation residual, uncertainty representation, replay artifact, and most likely field failure for where am i and what does the world look like? If one field is vague, the estimator is not ready for embodied use.

Key Takeaway

Where am I and what does the world look like is production-ready only when geometry, uncertainty, timing, and action consequences are tested together.

Exercise 29.1.1

Design the two-run test around global localization: run once with a correct prior and once with an intentionally ambiguous starting pose. Report convergence time, covariance collapse, wrong-mode persistence, and the planner action blocked until localization is credible.

What's Next?

Continue to Section 29.2: Odometry and dead reckoning, where this state-estimation contract becomes the input to the next embodied capability.

Section References

Durrant-Whyte, H. and Bailey, T. "Simultaneous Localization and Mapping." IEEE Robotics and Automation Magazine, 2006. https://ieeexplore.ieee.org/document/1638022

Classic SLAM tutorial that frames the estimation problem and the role of uncertainty.

GTSAM Project. "Factor Graphs and GTSAM." Official documentation. https://gtsam.org/

Primary tool reference for factor graphs, smoothing, pose graphs, and robotics estimation examples.

ROS 2 Navigation Project. "Nav2 documentation." Official documentation. https://navigation.ros.org/

Primary documentation for integrating localization, maps, planners, controllers, behavior trees, and recoveries.