Learned navigation is embodied when the policy can recover from the world changing under it.
A Local Planner With Commitment Issues
Learned navigation policies turns maps and goals into constrained motion. The planner is not searching for a pretty line; it is choosing a feasible commitment under geometry, dynamics, uncertainty, moving obstacles, and recovery rules.
Problem First
Learned policies can map observations directly to actions, but a policy that beats a weak baseline in one simulator may still fail under sensor shift, new layouts, or a different robot body.
A learned navigation policy should be evaluated against classical baselines, not in isolation. The input contract must name observation modalities, memory state, action space, training distribution, and the recovery layer that catches unsafe outputs.
A learned navigator is trustworthy only when its failures are labeled at the interface where they occur.
Formal Model
Most navigation methods can be read as constrained search or optimization:
$$ \pi_\theta(a_t\mid o_{\le t},g),\quad J(\theta)=\mathbb E\left[\sum_t r(s_t,a_t)\right] $$
The objective is task completion under learned behavior; constraints are action limits, safety margins, observation validity, and out-of-distribution detection.
- Define observation and action spaces in robot units.
- Train or fine-tune under scenario panels that include failures, not only success cases.
- Compare against graph search, local planning, and oracle-map baselines.
- Deploy with shields, confidence monitors, and fallback planners.
Worked Diagnostic
Code Fragment 1 isolates the learned-policy interface: observation tensor, recurrent or memory state, action distribution, command limits, and safety monitor. The point is to make shortcut behavior visible.
Expected output interpretation. The learned policy has the higher raw success rate, but it still loses after intervention cost is accounted for. This is the key reading of the output: a navigation policy that needs more human or safety-layer correction is not outperforming the classical baseline in the deployed sense that matters.
Tool Workflow
Habitat, RoboTHOR, AI2-THOR, Isaac Lab, and PyTorch provide training and evaluation infrastructure, while Nav2 remains the practical baseline for deployed mobile robots. The shortcut is to run learned policies beside a classical stack, not instead of one by default.
Keep the small policy diagnostic as a test for observation-action semantics. Use Habitat, Isaac Lab, Nav2 integration, or robot logs for serious evaluation.
Replay domain shift, lighting change, localization jump, blocked route, and actuator saturation. Learned navigation should be evaluated by recovery and safety, not only success.
Log observation frames, policy logits or action distribution, chosen command, value estimate if present, costmap or memory state, and failure label. Without those fields, a learned route is hard to debug.
Freeze observation stack, action space, reward, dataset split, simulator settings, robot limits, and evaluation seeds before comparing learned policies.
Research directions include vision-language navigation, foundation-model affordance maps, offline robot navigation datasets, and policy distillation into safety-monitored controllers.
Learned navigation is embodied when the policy can recover from the world changing under it.
Can you state the search space, cost function, constraints, replanning trigger, controller interface, and failure metric for learned navigation policies? If not, the planner is not specified enough to deploy.
Learned navigation policies is ready for embodied use when route quality, dynamic feasibility, local control, and recovery behavior are measured in the same replay.
Run the panel with train-like route, blocked route, and visual distractor. Report success, collision margin, intervention count, recovery behavior, and whether the policy uses stale observations.
What's Next?
Continue to Section 30.6: Language- and image-goal navigation, where this planning contract connects to the next embodied capability.
Section References
LaValle, S. M. "Planning Algorithms." Cambridge University Press, 2006. http://lavalle.pl/planning/
Open textbook reference for graph search, sampling-based planning, configuration spaces, and kinodynamic planning.
OMPL Project. "Open Motion Planning Library." Official documentation. https://ompl.kavrakilab.org/
Primary tool reference for sampling-based planners such as RRT, RRTstar, PRM, and kinodynamic variants.
ROS 2 Navigation Project. "Nav2 documentation." Official documentation. https://navigation.ros.org/
Primary documentation for global planners, controllers, costmaps, behavior trees, and recovery behaviors.