Section 27.5: Affordances and graspable regions | Building Embodied AI: From Perception to Autonomous Action

"Affordance is the promise that perception can name what the body can actually do."
A Patient Embodied AI Agent

Scene shows several candidate contact regions on an object, with one region rejected by reach and collision constraints before grasping. — **Figure 27.5A**: An affordance is a possibility, not a permission slip; the robot still has to reach it safely.

Big Picture

Affordances and graspable regions asks what the robot can do with a visible region. Category names are optional; action possibilities such as grasp, push, pull, pour, step, wipe, or avoid are the core output.

Problem First: Why This Representation Exists

Affordance prediction must bind perception to embodiment. A graspable region, traversable gap, or pushable surface is meaningful only relative to gripper geometry, base footprint, force limits, and task goal.

The contract here maps visual evidence to executable possibilities: affordance label, contact frame, body constraint, confidence, action primitive, and success predicate.

Action Is The Unit Of Meaning

An affordance becomes embodied knowledge when it narrows the action set to options that the current robot can execute under the current constraints.

Figure 27.5.1 should be read as an affordance contract: candidate region, contact frame, gripper constraint, confidence, and verifier decide whether the grasp or push is admissible.

Figure 27.5.1: Affordance scoring from region evidence to grasp action. The dashed feedback path reminds the reader that perception quality is judged by action consequences and replayable diagnostics.

Mathematical Core

An affordance map scores actions over image or 3D regions under robot constraints.

Formal Object

$A(r,a)=P(\mathrm{success}\mid \phi(r),a,\theta_{\mathrm{robot}}),\quad a^*=\arg\max_{a\in\mathcal A} A(r,a)-\lambda C(a)$

The feature vector $\phi(r)$ may include mask shape, depth, surface normal, material cues, and semantic features. The cost term $C(a)$ penalizes collision risk, reach limits, force limits, or task time.

Affordance-to-skill handoff

Extract candidate regions from masks, depth discontinuities, or learned heatmaps.
Estimate local geometry, including surface normals and approach directions.
Score each region-action pair for success probability and execution cost.
Send the selected region, pose, and uncertainty to the skill controller.

Affordance Representations

Design Choice	Use When	Control Risk
Heatmap	Fast image-space grasp or push proposals	Must be lifted to metric pose before control.
Object-centric affordance	Reusable skills such as open, pour, wipe	Object identity can hide local contact constraints.
3D contact affordance	Dexterous manipulation and humanoid hands	Requires reliable geometry and force-aware execution.

Worked Miniature

Code Fragment 27.5.1 scores candidate grasp regions by combining learned affordance probability with reach and collision costs. The small table-like arrays stand in for a vision model output.

# Choose a graspable region from affordance and cost terms.
# The selected region must be promising and executable by the robot.
import numpy as np

affordance = np.array([0.88, 0.74, 0.67])
reach_cost = np.array([0.35, 0.08, 0.05])
collision_cost = np.array([0.10, 0.06, 0.30])
score = affordance - 0.6 * reach_cost - 0.8 * collision_cost
print(np.round(score, 3))
print(int(score.argmax()))

[0.590 0.644 0.400] 1

The expected output array tells you region 1 is best only after embodiment costs are included, not because it has the strongest raw affordance score. The index 1 therefore means "most executable grasp region under current geometry," not simply "most visually grasp-like patch."

Code Fragment 27.5.1: The `score` calculation prevents the highest affordance heatmap value from winning automatically. Region 1 wins because its lower `reach_cost` and `collision_cost` make it more executable for the embodied system.

Library Shortcut

In practice, grasp pipelines often combine learned heatmaps, depth processing in Open3D, and robot-specific inverse kinematics. The libraries shorten perception and geometry work, but the system still needs an explicit region-to-skill contract.

Failure Mode To Test

An affordance is not a permission slip. A mug handle may be visually graspable but unreachable from the current arm pose, blocked by clutter, or unsafe under the current force limit.

Practical Example

A service robot loading a dishwasher should score regions by grasp success, collision with nearby dishes, wrist clearance, and whether the chosen contact leaves the object in a stable orientation for placement.

Memory Hook

For Affordances and graspable regions, the perception result must answer what action changed, what uncertainty changed, and what log would reproduce the decision. Otherwise the output is still visualization, not embodied evidence.

Debugging And Evaluation

Evaluate affordances through attempted actions: record candidate region, body model, predicted primitive, force or clearance margin, execution result, and affordance failure label.

Perturb object pose, gripper size, surface friction, occlusion, and task goal, then check whether the affordance changes when the executable action should change.

Research Frontier

Affordance learning is moving from category-specific grasp detection toward open-vocabulary, language-conditioned, and 3D contact-aware skill grounding. The hard robotics problem is making those affordances executable under real robot kinematics and contact limits.

What's Next

Section 27.6 asks a deeper question: rather than passively scoring what is already visible, the robot can choose where to look next, turning perception itself into a deliberate action with an information cost.

Section References

Open3D. Geometry documentation. https://www.open3d.org/docs/release/tutorial/geometry/index.html

Provides practical primitives for normals, point clouds, and geometry processing used in affordance pipelines.

Meta AI. Segment Anything Model 2. https://ai.meta.com/research/sam2/

Promptable masks can provide candidate regions, but robotics still needs affordance and constraint scoring.

Self Check

Can you name the representation, the consuming action, the uncertainty or freshness field, and the failure label for Affordances and graspable regions? If any one is missing, the section is not yet ready for a robot replay log.

Key Takeaway

Affordances are action-conditioned predictions over regions, and they become useful only after reach, collision, contact, and uncertainty constraints are attached.

Exercise 27.5.1

Pick one household object and list three visible regions. For each region, score grasp, push, and avoid actions, then name the robot constraint that could veto the top score.