Section 4.5: 2D and 3D transformations; transform trees (tf in ROS)

A Careful Control Loop
Technical illustration for Section 4.5: 2D and 3D transformations; transform trees (tf in ROS).
Figure 4.5A: A ROS-style transform tree for a mobile manipulator: world-to-odom, odom-to-base, base-to-arm, arm-to-gripper, and camera frames, with tf lookups resolving any pair at a given timestamp.
Big Picture

2D and 3D transformations; transform trees (tf in ROS) turn many local frame relationships into one queryable robot-wide spatial memory. A mobile robot may have frames for map, odom, base_link, lidar, camera, wrist, and gripper. The transform tree lets any subsystem ask, "Where is this point in the frame I control?" without every subsystem manually knowing every calibration edge.

This section develops the technical contract for 2D and 3D transforms as a graph problem. First we define a transform tree as a directed acyclic frame graph with one parent per child. Then we show how a lookup composes edges along a path. Finally we connect the math to the ROS tf2 discipline of stamped transforms, buffer windows, and explicit lookup times.

The key question is practical: when a camera detects an obstacle, which chain of transforms converts that obstacle into the planning frame at the time the planner needs it?

Theory

A transform tree stores edges such as $T_{\text{map},\text{odom}}$, $T_{\text{odom},\text{base}}$, and $T_{\text{base},\text{camera}}$. A lookup from camera to map follows the unique path through the tree and composes the transforms in path order:

$$T_{\text{map},\text{camera}} = T_{\text{map},\text{odom}}T_{\text{odom},\text{base}}T_{\text{base},\text{camera}}.$$

This is why tf2 insists on parent and child frame names. Without the names, a transform matrix is only a 4 by 4 array. With the names and timestamp, it becomes a claim about where one coordinate convention sits relative to another at a specific time.

2D transforms are the same idea with fewer degrees of freedom. A planar robot often uses $(x, y, \theta)$ and the group SE(2). A flying robot, manipulator, or camera-bearing humanoid needs SE(3), because roll, pitch, yaw, and vertical translation are load-bearing state variables.

Mechanism

A tf buffer is a time-indexed graph. Static edges store calibration, such as base to camera. Dynamic edges store motion estimates, such as map to odom or odom to base. A correct lookup must choose both a path and a time; spatial correctness and temporal correctness are inseparable.

Worked Example

Code Fragment 4.5.1 implements the smallest useful transform-tree lookup. It stores three edges, composes the path from map to camera, and applies the resulting transform to one point reported by the camera.

# Compose a tf-style path from map to camera and transform one point.
# Each edge is named by parent and child frame to prevent silent direction bugs.
# The example omits rotation so the path arithmetic is easy to inspect.
import numpy as np

def translate(x, y, z):
    transform = np.eye(4)
    transform[:3, 3] = [x, y, z]
    return transform

edges = {
    ("map", "odom"): translate(2.0, 0.0, 0.0),
    ("odom", "base_link"): translate(0.5, 1.0, 0.0),
    ("base_link", "camera"): translate(0.2, 0.0, 0.8),
}

path = [("map", "odom"), ("odom", "base_link"), ("base_link", "camera")]
map_from_camera = np.eye(4)
for edge in path:
    map_from_camera = map_from_camera @ edges[edge]

point_camera = np.array([1.0, 0.0, 0.0, 1.0])
point_map = map_from_camera @ point_camera
print(point_map[:3].round(3).tolist())
[3.7, 1.0, 0.8]
Code Fragment 4.5.1 composes the named edges map to odom, odom to base_link, and base_link to camera. The resulting point_map value shows how a camera measurement becomes planner-ready map-frame evidence.

Expected output: the point moves by the sum of the three translations. If a real tf2 lookup gives a different direction, inspect whether the code requested source-to-target or target-to-source, and whether the lookup time matches the sensor timestamp.

Library Shortcut

The hand-built fragment keeps frame semantics visible. In production, SciPy Rotation handles rotation representations, ROS 2 tf2 keeps a time-buffered frame tree, spatialmath-python gives compact pose algebra, Drake exposes typed rigid transforms, and OpenCV calibration anchors camera intrinsics and extrinsics. The shortcut removes boilerplate, but the hand-built version remains the debugging oracle.

Failure Modes
Memory Hook

The tf tree is implicit matrix multiplication made explicit, named, and time-stamped. Every silent frame-direction bug in robot code is really a silent matrix-order bug that the transform tree disciplines away.

Research Frontier

Static tf trees assume rigid bodies. Research on deformable robots, soft actuators, and contact-rich manipulation requires probabilistic or deformable frame representations. The GTSAM factor graph attaches covariance to each edge so that a SLAM back-end can propagate uncertainty through the tree. Neural implicit representations (NeRF-based SLAM) take a different approach: rather than maintaining a frame tree, they embed geometry directly in a continuous function and query poses by optimization. Both directions are active, and neither has displaced tf2 for real-time reactive control as of 2026.

Transform-tree bugs look like weak perception or control. Check parent-child direction, timestamp, static-vs-dynamic classification, and buffer latency before changing the robot policy.

Section References

Foote, T. "tf: The transform library." IEEE Conference on Technologies for Practical Robot Applications (TePRA), 2013.

The design document for the ROS tf system: frame naming, parent-child conventions, time-buffered lookup, and the motivation for separating static from dynamic edges.

Lynch, K. M., and Park, F. C. "Modern Robotics: Mechanics, Planning, and Control." Cambridge University Press, 2017. http://modernrobotics.org

Establishes the screw-theory view of SE(2) and SE(3) composition used throughout this chapter; the transform-tree lookup is Chapter 3 composition in graph form.

ROS 2 tf2 documentation. https://docs.ros.org/en/rolling/Concepts/Intermediate/About-Tf2.html

The authoritative reference for buffer windows, lookup API, static vs. dynamic broadcasters, and tf2 migration from ROS 1.

Exercise 4.5.1

Extend the Code Fragment above with a rotation. Give the odom-to-base_link edge a 90° yaw rotation (rotation matrix that swaps x and y). Compose the full path map to camera and verify: (a) the camera origin in map coordinates, (b) that a unit vector pointing forward in the camera frame maps to the correct direction in the map frame, and (c) that map_from_camera @ camera_from_map = I. Explain which intermediate transform is most likely to be wrong if the robot turns left when commanded to go forward.