Chapter 42: Robotic Manipulation | Building Embodied AI: From Perception to Autonomous Action

"Manipulation begins where motion planning meets consequence."
A Careful Embodied Systems Builder

Big Picture

This chapter treats robotic manipulation as object-state control under geometry, contact, friction, and uncertainty. The core teaching move is to keep object outcome, not arm motion alone, at the center of every method.

Remember This Chapter

Good manipulation stacks expose explicit contracts for object state, contact mode, verifier logic, and recovery. A motion that looks smooth but leaves those contracts vague is not yet an embodied system.

Chapter Overview

Chapter 42 moves from the simplest object-state changes, reaching and pushing, into staged pick-and-place pipelines, contact-rich control, perception for action, learned policies, recovery, and finally whole-body mobile manipulation.

The practical stack emphasizes MoveIt 2, cuRobo or cuMotion, Drake, MuJoCo, ManiSkill, Nav2, and BehaviorTree.CPP. The theory thread stays grounded in object-state change, contact residuals, and same-panel evidence.

Prerequisites

Readers should already be comfortable with frames, Jacobians, control loops, simulation, and basic policy-learning ideas. This chapter shows how those ingredients become a manipulation system that can be audited and repaired.

Chapter Roadmap

42.1 What manipulation is; reaching and pushingManipulation starts when the robot changes object state on purpose and can explain the geometry, contact, and feedback path that made the change happen.
42.2 Pick-and-place pipelinesPick and place is the canonical manipulation pipeline because it exposes the full stack: perception, grasp generation, motion planning, force-limited execution, placement verification, and recovery.
42.3 Contact-rich interactionContact-rich tasks such as insertion, wiping, scraping, opening, and assembly expose the limits of open-loop pose control because the environment must help shape the motion.
42.4 Perception for manipulationManipulation perception is not generic scene understanding. It is perception tuned to the action question: what can be reached, grasped, pushed, inserted, or recovered from now?
42.5 Learning manipulation policies (IL, RL, VLA)Learning-based manipulation policies sit on top of the same physics and interface contracts as analytic pipelines. Their promise is adaptation and generalization, not exemption from contact, safety, or evaluation.
42.6 Failure detection and recoveryManipulation systems fail for ordinary reasons: missing the object, colliding, slipping, drifting, timing out, or entering an unrecoverable contact mode. Good systems detect those states early and route to bounded recovery.
42.7 Mobile Manipulation: Base, Arm, Perception, And RecoveryMobile manipulation couples navigation, perception, reachability, grasping, and recovery into one long-horizon control problem. The base pose is part of the grasp plan, not just a precondition for it.

Tooling Note

Use maintained planners and simulators early, but keep the manipulation contract explicit: object state, action interface, verifier, and recovery route must survive any library swap.

Hands-On Lab: Build the Chapter System

Duration: about 90 to 150 minutesDifficulty: Intermediate to Advanced

Objective

Build a small manipulation benchmark that includes at least one reach-push task, one pick-and-place task, and one failure-recovery branch, all logged with the same evidence schema.

Steps

Define the object-state variables, action interfaces, and success metrics for all tasks.
Implement a transparent baseline and one maintained-tool route for each task family.
Log object outcomes, controller signals, and failure labels in one artifact per run.
Compare nominal and perturbed episodes on the same panel.
Write a short postmortem that explains one failure and one successful recovery.

What's Next?

Continue with Section 42.1: What manipulation is; reaching and pushing, where the chapter moves from framing to the first concrete system contract.

Read each section as a system contract. Ask what the robot observes, which object state changes, which contact assumptions are active, and how the system proves success rather than merely animating the arm convincingly.

Chapter Tool Map

Tool or Library	Where It Pays Off
MoveIt 2	Motion planning, staging, and execution for arm-level manipulation
cuMotion or cuRobo	Fast collision-free motion generation and replanning
Drake and MuJoCo	Contact-aware simulation, control analysis, and residual inspection
ManiSkill	Manipulation benchmarks and high-throughput policy training
Nav2 and BehaviorTree.CPP	Whole-body staging and recovery for mobile manipulation

Chapter Lab Extension

Extend the lab by adding one perturbation, one recovery behavior, and one failure taxonomy. Save configuration, logs, metrics, and two representative traces in the same folder.

The chapter works best when students build one inspectable artifact per section: a push residual plot, a pick-stage ledger, a contact-force trace, a perception uncertainty plot, a policy audit, a recovery branch table, and a mobile-manipulation base-pose scorecard.

A strong teaching move here is to force every manipulation claim through the same reporting frame: target object state, contact assumptions, failure taxonomy, perturbation panel, and the exact intervention or recovery branch taken when the plan stops matching the world. That extra bookkeeping makes the chapter index heavier, but it also makes the material auditable instead of theatrical.

Readiness Check

Before leaving the chapter, the reader should be able to state how manipulation success is measured at the object level, how contact is verified, and how failure is routed into bounded recovery.

Teaching Takeaway

Manipulation is the part of embodied AI where vague system boundaries are punished quickly. The teaching goal is to make those boundaries explicit enough that failure becomes localizable and repairable.

Agent Checklist Integration

This chapter has been reviewed as a teaching and builder unit with attention to depth, code pedagogy, diagrams, exercises, scientific framing, and practical stacks.

The chapter is also meant to make manipulation legible across academic and production settings. A research prototype, a warehouse cell, and a home-assistance stack may use different tools and sensors, but they all need the same core evidence contract: target object state, contact assumptions, action interface, timing trace, and bounded recovery route when the object or scene violates the plan.

Chapter Evidence Standard

A manipulation claim is ready only when it names the object-state target, contact assumption, action interface, verifier, perturbation panel, and recovery route on one shared evaluation script.

Bibliography & Further Reading

Primary Sources, Tools, and References

MoveIt 2 Documentation

Official planning and execution stack for ROS 2 manipulation.

cuMotion integration

GPU motion generation integrated with MoveIt workflows.

Drake

Simulation, optimization, and manipulation-planning toolkit.

ManiSkill

Benchmark and simulator suite for generalizable manipulation skills.