A Careful Control Loop
2D and 3D transformations; transform trees (tf in ROS) turn many local frame relationships into one queryable robot-wide spatial memory. A mobile robot may have frames for map, odom, base_link, lidar, camera, wrist, and gripper. The transform tree lets any subsystem ask, "Where is this point in the frame I control?" without every subsystem manually knowing every calibration edge.
This section develops the technical contract for 2D and 3D transforms as a graph problem. First we define a transform tree as a directed acyclic frame graph with one parent per child. Then we show how a lookup composes edges along a path. Finally we connect the math to the ROS tf2 discipline of stamped transforms, buffer windows, and explicit lookup times.
The key question is practical: when a camera detects an obstacle, which chain of transforms converts that obstacle into the planning frame at the time the planner needs it?
Theory
A transform tree stores edges such as $T_{\text{map},\text{odom}}$, $T_{\text{odom},\text{base}}$, and $T_{\text{base},\text{camera}}$. A lookup from camera to map follows the unique path through the tree and composes the transforms in path order:
$$T_{\text{map},\text{camera}} = T_{\text{map},\text{odom}}T_{\text{odom},\text{base}}T_{\text{base},\text{camera}}.$$
This is why tf2 insists on parent and child frame names. Without the names, a transform matrix is only a 4 by 4 array. With the names and timestamp, it becomes a claim about where one coordinate convention sits relative to another at a specific time.
2D transforms are the same idea with fewer degrees of freedom. A planar robot often uses $(x, y, \theta)$ and the group SE(2). A flying robot, manipulator, or camera-bearing humanoid needs SE(3), because roll, pitch, yaw, and vertical translation are load-bearing state variables.
A tf buffer is a time-indexed graph. Static edges store calibration, such as base to camera. Dynamic edges store motion estimates, such as map to odom or odom to base. A correct lookup must choose both a path and a time; spatial correctness and temporal correctness are inseparable.
Worked Example
Code Fragment 4.5.1 implements the smallest useful transform-tree lookup. It stores three edges, composes the path from map to camera, and applies the resulting transform to one point reported by the camera.
# Compose a tf-style path from map to camera and transform one point.
# Each edge is named by parent and child frame to prevent silent direction bugs.
# The example omits rotation so the path arithmetic is easy to inspect.
import numpy as np
def translate(x, y, z):
transform = np.eye(4)
transform[:3, 3] = [x, y, z]
return transform
edges = {
("map", "odom"): translate(2.0, 0.0, 0.0),
("odom", "base_link"): translate(0.5, 1.0, 0.0),
("base_link", "camera"): translate(0.2, 0.0, 0.8),
}
path = [("map", "odom"), ("odom", "base_link"), ("base_link", "camera")]
map_from_camera = np.eye(4)
for edge in path:
map_from_camera = map_from_camera @ edges[edge]
point_camera = np.array([1.0, 0.0, 0.0, 1.0])
point_map = map_from_camera @ point_camera
print(point_map[:3].round(3).tolist())
map to odom, odom to base_link, and base_link to camera. The resulting point_map value shows how a camera measurement becomes planner-ready map-frame evidence.Expected output: the point moves by the sum of the three translations. If a real tf2 lookup gives a different direction, inspect whether the code requested source-to-target or target-to-source, and whether the lookup time matches the sensor timestamp.
The hand-built fragment keeps frame semantics visible. In production, SciPy Rotation handles rotation representations, ROS 2 tf2 keeps a time-buffered frame tree, spatialmath-python gives compact pose algebra, Drake exposes typed rigid transforms, and OpenCV calibration anchors camera intrinsics and extrinsics. The shortcut removes boilerplate, but the hand-built version remains the debugging oracle.
- Wrong lookup direction. Requesting
tf.lookup("camera", "map")instead oftf.lookup("map", "camera")returns the transpose of the intended transform. In SE(3) those are different objects. One-point sanity checks (does the camera appear in front of the robot?) are faster than reading quaternion signs. - Timestamp mismatch. tf2 interpolates between buffered transforms. If you look up the camera-to-odom transform at wall time rather than the camera image timestamp, you introduce latency-proportional pose error. For a robot moving at 1 m/s and a 50 ms latency, that is 5 cm of systematic placement error.
- Static transform republished on every tick. Publishing a calibration edge (base to camera) as a dynamic transform causes every downstream subscriber to receive a duplicate. Use
StaticTransformBroadcasterin ROS 2 for edges that never move. - Cycle in the tree. tf silently fails if two nodes each claim to be the other's parent. The error appears far downstream as an impossible pose or a buffer timeout, not at the frame where the cycle was introduced.
The tf tree is implicit matrix multiplication made explicit, named, and time-stamped. Every silent frame-direction bug in robot code is really a silent matrix-order bug that the transform tree disciplines away.
Static tf trees assume rigid bodies. Research on deformable robots, soft actuators, and contact-rich manipulation requires probabilistic or deformable frame representations. The GTSAM factor graph attaches covariance to each edge so that a SLAM back-end can propagate uncertainty through the tree. Neural implicit representations (NeRF-based SLAM) take a different approach: rather than maintaining a frame tree, they embed geometry directly in a continuous function and query poses by optimization. Both directions are active, and neither has displaced tf2 for real-time reactive control as of 2026.
Transform-tree bugs look like weak perception or control. Check parent-child direction, timestamp, static-vs-dynamic classification, and buffer latency before changing the robot policy.
Section References
Foote, T. "tf: The transform library." IEEE Conference on Technologies for Practical Robot Applications (TePRA), 2013.
The design document for the ROS tf system: frame naming, parent-child conventions, time-buffered lookup, and the motivation for separating static from dynamic edges.
Lynch, K. M., and Park, F. C. "Modern Robotics: Mechanics, Planning, and Control." Cambridge University Press, 2017. http://modernrobotics.org
Establishes the screw-theory view of SE(2) and SE(3) composition used throughout this chapter; the transform-tree lookup is Chapter 3 composition in graph form.
ROS 2 tf2 documentation. https://docs.ros.org/en/rolling/Concepts/Intermediate/About-Tf2.html
The authoritative reference for buffer windows, lookup API, static vs. dynamic broadcasters, and tf2 migration from ROS 1.
Extend the Code Fragment above with a rotation. Give the odom-to-base_link edge a 90° yaw rotation (rotation matrix that swaps x and y). Compose the full path map to camera and verify: (a) the camera origin in map coordinates, (b) that a unit vector pointing forward in the camera frame maps to the correct direction in the map frame, and (c) that map_from_camera @ camera_from_map = I. Explain which intermediate transform is most likely to be wrong if the robot turns left when commanded to go forward.