"The folder looked organized until the policy asked which camera was the wrist camera."
A Dataset Loader
LeRobotDataset turns robot demonstrations into a standardized multimodal time-series format with metadata, video, sensorimotor signals, and Hub integration. The point is not only convenience; the point is to make training, visualization, sharing, and reproducibility use the same data contract.
Format Contract
A robot dataset must answer four questions before a model loads it: what is the observation, what is the action, what episode does this frame belong to, and what metadata defines the body and task? LeRobotDataset v3 organizes these answers around standardized feature names, Parquet tables, videos or images, and metadata files.
If two labs load the same dataset differently, they are not running the same experiment. Standardized formats reduce the chance that preprocessing becomes an invisible baseline difference.
| Field Family | Typical Contents | Why It Matters |
|---|---|---|
| Observation | Images, robot state, proprioception, tactile signals | Defines what the policy can know. |
| Action | Joint targets, end-effector deltas, gripper commands | Defines what the policy is trained to output. |
| Episode index | episode id, frame index, timestamp | Keeps temporal structure intact. |
| Metadata | fps, robot type, features, splits, license | Makes loading and comparison reproducible. |
Code Fragment 1 validates a tiny feature schema before conversion. This catches the most common mistake: an action or timestamp field that exists in prose but not in the actual files.
The expected output is intentionally boring: schema ok: True and an empty missing-field list. That boring result is valuable because it means every later training script can assume episode identity, temporal order, timestamp alignment, observation image, robot state, and action are present. If observation.state or timestamp is missing, do not patch the trainer; repair the conversion pipeline so every downstream policy receives the same scientific object.
After the schema is clear, use LeRobot's dataset tooling to create, push, visualize, and train from the dataset. The maintained stack handles storage layout, Hub metadata, video indexing, and PyTorch access that would otherwise become fragile custom glue.
Pipeline Recipe
- Collect raw logs with hardware timestamps and calibration versions.
- Normalize feature names and units, especially action units and camera names.
- Convert frames and states into a standardized dataset directory.
- Run a loader smoke test that reads random frames across episodes.
- Publish metadata, dataset card, split manifest, and license before reporting results.
Conversion And Verification
A robust conversion pipeline keeps three layers separate. The raw layer preserves the original robot logs, including vendor-specific messages and leader-device signals. The normalized layer exposes canonical features such as observation.images.front, observation.state, and action. The training layer may add cached tensors, resized videos, or model-specific transforms, but those derived artifacts should be reproducible from the normalized layer.
The most important verification step is random-access replay. Sample an episode from the beginning, middle, and end of the dataset; render the camera frame; print the aligned robot state; and overlay the action that follows. This catches off-by-one frame shifts, stale calibration, swapped wrist cameras, and action-unit mistakes that a shape-only validator will miss.
- Open the dataset through the same loader used by training.
- Sample ten frames across at least three episodes.
- Verify monotonic timestamps and constant frame-rate assumptions.
- Render camera frames with robot state and action summaries beside them.
- Fail the conversion if any feature is missing, temporally shifted, or unit-ambiguous.
A dataset can pass shape checks while failing semantics. Joint radians, joint degrees, end-effector meters, normalized gripper width, and binary gripper state must not share a vague field called action.
A lab converting GELLO demonstrations should store both the raw leader joint stream and the follower action target. The raw stream helps debug interface failures; the follower target is usually the training label.
The next dataset-format frontier is not only larger storage. It is queryable robot experience: find episodes by task language, failure mechanism, embodiment, camera geometry, object category, and action representation without writing a custom parser for every lab.
Can a reader load one episode, recover every camera frame, align it to robot state and action, identify the task instruction, and know the license? That is the minimum bar for reusable robot data.
Standard formats turn teleoperation logs into scientific artifacts. The policy is only as reproducible as the dataset loader, metadata, and split manifest that feed it.
Take a raw demonstration folder and draft a LeRobotDataset-style feature schema. Mark which fields need unit conversion and which fields must remain raw for debugging.
What's Next
Chapter 24 builds on this format contract by comparing major robot datasets and the scaling laws that motivate pooling data across robots.
Zhao, T. Z. et al. (2023). Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.
Introduces ALOHA and ACT, making the connection between low-cost bimanual teleoperation, action chunking, and real-world manipulation data explicit.
A kinematically matched leader device study that directly compares teleoperation ergonomics and reliability against other low-cost interfaces.
Defines the handheld gripper approach, latency matching, and relative-trajectory action interface used in portable demonstration collection.
Cheng, X. et al. (2024). Open-TeleVision: Teleoperation with Immersive Active Visual Feedback.
A current reference for immersive visual feedback, active perception, and VR-style operator embodiment in data collection.
Hugging Face LeRobot Documentation.
Documents dataset conversion, policy training, and robot-control utilities that turn teleoperation logs into reusable learning artifacts.