Section 42.2: Pick-and-place pipelines | Building Embodied AI: From Perception to Autonomous Action

"Pipelines look boring until one missing state estimate breaks the whole warehouse."
A Builder's Planning Notebook

Illustration for Section 42.2: Pick-and-place pipelines — **Figure 42.2A**: A pick-and-place pipeline earns trust only when every stage has a verifier and a way to hand off failure to recovery.

Big Picture

Pick and place is the canonical manipulation pipeline because it exposes the full stack: perception, grasp generation, motion planning, force-limited execution, placement verification, and recovery.

This section breaks the classical pick-and-place stack into inspectable contracts: segmentation, 6D pose estimation, grasp proposal, approach trajectory, gripper closure, lift test, transfer, and place verification.

The payoff is practical. Once the stages are explicit, builders can swap MoveIt, cuRobo, Dex-Net, or a learned VLA policy into one stage without turning the full system into a debugging fog bank.

Action Is The Test

Most industrial pick-and-place failures are not mysterious. They are stage failures that were never isolated: bad pose proposals, invalid grasps, planner dead ends, premature gripper closure, or unverified placement.

Figure 42.2.1: A pick-and-place pipeline earns trust only when every stage has a verifier and a way to hand off failure to recovery.

Theory

Pick and place is a hybrid system with discrete stages and continuous control inside each stage. The important modeling habit is to attach explicit preconditions and postconditions to every stage so silent transitions are impossible.

A clean factorization treats grasp quality and trajectory feasibility separately, then combines them. That separation matters because a geometrically good grasp may still be unreachable under joint limits or collision constraints.

$$ g^\star = \arg\max_{g \in \mathcal{G}} Q_{\text{grasp}}(g)\,\mathbf{1}[\text{reachable}(g)]\,\mathbf{1}[\text{collision\_free}(g)],\qquad T = T_{\text{approach}} \circ T_{\text{lift}} \circ T_{\text{place}} $$

Mechanism

The pipeline observes the scene, proposes grasps, filters by reachability and collision, executes the pick with force-limited closure, verifies the lift, transports the object, and confirms final placement. A solid log stores failures at the stage boundary, not only at the episode boundary.

Algorithm: Pick-Place Stage Filter

Segment the scene and estimate object pose or graspable surfaces.
Generate candidate grasps and rank them by quality, reachability, and downstream placement compatibility.
Plan approach, closure, lift, and placement motions with explicit stage verifiers.
If lift or placement fails, route to a bounded recovery such as regrasp, reobserve, or skip-bin.

Worked Example

# Filter grasp candidates by score and downstream feasibility.
grasps = [
    {"id": "g1", "quality": 0.91, "reachable": True, "place_ok": False},
    {"id": "g2", "quality": 0.84, "reachable": True, "place_ok": True},
    {"id": "g3", "quality": 0.73, "reachable": False, "place_ok": True},
]

ranked = []
for g in grasps:
    score = g["quality"] * float(g["reachable"]) * float(g["place_ok"])
    ranked.append((g["id"], round(score, 2)))

ranked.sort(key=lambda row: row[1], reverse=True)
print(ranked)
print("selected", ranked[0][0])

[('g2', 0.84), ('g1', 0.0), ('g3', 0.0)] selected g2

Code Fragment 42.2.1 keeps the highest raw grasp score from dominating when the grasp cannot support the later place stage.

Expected output: The expected output selects the slightly weaker but fully feasible grasp. A robust pipeline prefers reachable and place-compatible grasps over visually impressive but dead-end candidates.

Library Shortcut

MoveIt 2 and cuMotion can own the arm-motion stages, while Dex-Net style scoring or learned grasp heads own the proposal stage. The pipeline remains legible only if stage outputs are serialized into one manifest.

Practical Recipe

Define stage-level inputs and outputs before writing the first planner callback.
Score grasps with downstream placement feasibility included, not as an afterthought.
Verify the lift with object motion, gripper width, and force history together.
After placement, measure object pose relative to the target bin or support surface, not only whether the gripper opened.
Save one replay artifact with stage timestamps, selected grasp id, and recovery route.

Common Failure Mode

Teams often celebrate grasp success and miss that their place stage is doing all the hard work with luck. If the chosen grasp makes placement infeasible, the upstream score is misleading by construction.

Practical Example

Sorting cells for e-commerce fulfillment frequently fail on the transition from lift to transport, where swinging payloads or poor suction seals only become visible once the box clears the tote.

Memory Hook

Pick-and-place demos love the moment of lift. Production robots earn their salary during the far less cinematic moments of handoff, transport, and final pose verification.

Research Frontier

Modern pipelines increasingly combine learned grasp proposals with optimization-based placement and GPU motion generation. The reliable systems are still the ones that keep stage boundaries explicit enough to audit.

Self Check

Could you explain why your chosen grasp is compatible with both the pick and the place stage, or are you hoping the planner will rescue a bad upstream decision?

The subtle systems question is whether a grasp preserves future optionality while solving the immediate pickup. A bin-picking grasp that blocks the object's target orientation or occludes a second arm may be locally strong and globally poor.

For instruction, this section is a good place to contrast open-loop pipeline charts with evidence-backed pipeline manifests. The latter let students debug stage interactions rather than searching across the whole stack blindly.

Practical Tool Choices For This Section

Tool or Library	Role in the Topic	Builder Advice
MoveIt 2	Stage planning and execution	Use separate planning groups or task constructors for approach, lift, and place.
cuMotion via MoveIt plugin	High-throughput replanning	Useful when the scene changes quickly or the cell runs on NVIDIA hardware.
Dex-Net or GQ-CNN	Grasp scoring	Use it to rank candidates, but always filter through reachability and place constraints.

Mini Lab

Implement a two-object pick-and-place benchmark where one object is easier to grasp but impossible to place without collision. Show that your selector avoids it.

When a full cycle fails, label the first violated postcondition: pose estimate, chosen grasp, approach path, closure, lift, transfer, or place. The first broken stage is usually the most informative one.

Section References

MoveIt 2 Documentation

Official documentation for planning, kinematics plugins, and execution interfaces in ROS 2 manipulation.

cuMotion for MoveIt

Official integration of GPU motion generation into MoveIt workflows.

Dex-Net project

Dex-Net ties grasp datasets, robust grasp metrics, and learned scoring into deployable pick pipelines.

Key Takeaway

A pick-and-place pipeline is trustworthy when every stage exposes a typed handoff and a verifier, not when the end-to-end demo looks smooth from far away.

Exercise 42.2.1

Write a stage manifest for a tote-to-shelf pick-and-place task, including one failure branch for bad perception and one for failed lift verification. Explain what metric each branch should log.