Section 30.5: Learned navigation policies | Building Embodied AI: From Perception to Autonomous Action

Learned navigation is embodied when the policy can recover from the world changing under it.
A Local Planner With Commitment Issues

Educational illustration for Section 30.5, showing learned navigation policies as a robot reasoning problem that connects measurements, state estimates, decisions, and replayable evidence. — **Figure 30.5.1**: Learned navigation policies becomes useful when the visual idea is tied to a state variable, an uncertainty model, and the next robot action.

Big Picture

Learned navigation policies turns maps and goals into constrained motion. The planner is not searching for a pretty line; it is choosing a feasible commitment under geometry, dynamics, uncertainty, moving obstacles, and recovery rules.

Problem First

Learned policies can map observations directly to actions, but a policy that beats a weak baseline in one simulator may still fail under sensor shift, new layouts, or a different robot body.

A learned navigation policy should be evaluated against classical baselines, not in isolation. The input contract must name observation modalities, memory state, action space, training distribution, and the recovery layer that catches unsafe outputs.

Feasibility Before Beauty

A learned navigator is trustworthy only when its failures are labeled at the interface where they occur.

Formal Model

Most navigation methods can be read as constrained search or optimization:

$$ \pi_\theta(a_t\mid o_{\le t},g),\quad J(\theta)=\mathbb E\left[\sum_t r(s_t,a_t)\right] $$

The objective is task completion under learned behavior; constraints are action limits, safety margins, observation validity, and out-of-distribution detection.

Algorithm: Section 30.5 Planning Loop

Define observation and action spaces in robot units.
Train or fine-tune under scenario panels that include failures, not only success cases.
Compare against graph search, local planning, and oracle-map baselines.
Deploy with shields, confidence monitors, and fallback planners.

Worked Diagnostic

Code Fragment 1 isolates the learned-policy interface: observation tensor, recurrent or memory state, action distribution, command limits, and safety monitor. The point is to make shortcut behavior visible.

Expected output interpretation. The learned policy has the higher raw success rate, but it still loses after intervention cost is accounted for. This is the key reading of the output: a navigation policy that needs more human or safety-layer correction is not outperforming the classical baseline in the deployed sense that matters.

Code Fragment 1: The comparison penalizes interventions, so the higher raw success rate does not automatically win. Learned navigation needs construct-matched metrics with safety and recovery fields.

Tool Workflow

Library Shortcut

Habitat, RoboTHOR, AI2-THOR, Isaac Lab, and PyTorch provide training and evaluation infrastructure, while Nav2 remains the practical baseline for deployed mobile robots. The shortcut is to run learned policies beside a classical stack, not instead of one by default.

Keep the small policy diagnostic as a test for observation-action semantics. Use Habitat, Isaac Lab, Nav2 integration, or robot logs for serious evaluation.

Failure Mode To Test

Replay domain shift, lighting change, localization jump, blocked route, and actuator saturation. Learned navigation should be evaluated by recovery and safety, not only success.

Practical Example

Log observation frames, policy logits or action distribution, chosen command, value estimate if present, costmap or memory state, and failure label. Without those fields, a learned route is hard to debug.

Integration Checklist

Freeze observation stack, action space, reward, dataset split, simulator settings, robot limits, and evaluation seeds before comparing learned policies.

Research Frontier

Research directions include vision-language navigation, foundation-model affordance maps, offline robot navigation datasets, and policy distillation into safety-monitored controllers.

Memory Hook

Learned navigation is embodied when the policy can recover from the world changing under it.

Self Check

Can you state the search space, cost function, constraints, replanning trigger, controller interface, and failure metric for learned navigation policies? If not, the planner is not specified enough to deploy.

Key Takeaway

Learned navigation policies is ready for embodied use when route quality, dynamic feasibility, local control, and recovery behavior are measured in the same replay.

Exercise 30.5.1

Run the panel with train-like route, blocked route, and visual distractor. Report success, collision margin, intervention count, recovery behavior, and whether the policy uses stale observations.

What's Next?

Continue to Section 30.6: Language- and image-goal navigation, where this planning contract connects to the next embodied capability.

Section References

LaValle, S. M. "Planning Algorithms." Cambridge University Press, 2006. http://lavalle.pl/planning/

Open textbook reference for graph search, sampling-based planning, configuration spaces, and kinodynamic planning.

OMPL Project. "Open Motion Planning Library." Official documentation. https://ompl.kavrakilab.org/

Primary tool reference for sampling-based planners such as RRT, RRTstar, PRM, and kinodynamic variants.

ROS 2 Navigation Project. "Nav2 documentation." Official documentation. https://navigation.ros.org/

Primary documentation for global planners, controllers, costmaps, behavior trees, and recovery behaviors.