"An agent becomes interesting at the exact moment the world refuses to be a dataset."
A Patient Embodied AI Agent
Environments with Gymnasium (and PettingZoo) matters because embodied intelligence is a closed loop. The agent must turn partial observations into useful state, choose actions under uncertainty, and learn from the consequences in a physical or simulated world.
This section turns the interface from Chapter 2: The Agent-Environment Interface into runnable Gymnasium and PettingZoo practice. It prepares the reinforcement-learning algorithms in Chapter 14: Reinforcement Learning Refresher and the benchmark protocols in Chapter 12: Benchmarks and Task Suites by making spaces, seeds, wrappers, and termination semantics explicit.
The core move is to connect environments with gymnasium (and pettingzoo) to action. A static model can be accurate and still be useless if it cannot support timely, safe, and recoverable behavior.
Chapter Overview
Chapter 10 develops Environments with Gymnasium (and PettingZoo) as a working piece of the embodied AI stack. The chapter starts with the role this topic plays in the sense, represent, predict, decide, act, observe, and learn loop, then turns that role into a concrete implementation pattern.
The practical thread uses MuJoCo, MJX, Isaac Lab, Genesis, Newton, Drake, ROS 2, and modern Gazebo where appropriate, while the theory thread keeps the mechanism visible. The reader should leave with both a mental model and a build path.
Prerequisites
Readers should be comfortable with Python, tensors, and the perception-action loop. When the chapter uses geometry, control, or probability, the relevant appendices provide a compact refresher.
Chapter Roadmap
- 10.1 Gym is dead; Gymnasium is the standardBuild the concept, inspect the assumptions, and connect it to tools and evaluation.
- 10.2 Observation and action spacesBuild the concept, inspect the assumptions, and connect it to tools and evaluation.
- 10.3 Reward design and terminationBuild the concept, inspect the assumptions, and connect it to tools and evaluation.
- 10.4 Vectorized environments; wrappersBuild the concept, inspect the assumptions, and connect it to tools and evaluation.
- 10.5 Rendering, logging, and debuggingBuild the concept, inspect the assumptions, and connect it to tools and evaluation.
- 10.6 Evaluation protocol and seedingBuild the concept, inspect the assumptions, and connect it to tools and evaluation.
- 10.7 PettingZoo for multi-agentBuild the concept, inspect the assumptions, and connect it to tools and evaluation.
This chapter uses the right-tool principle. Build the mechanism once, then reach for maintained tools such as MuJoCo, MJX, Isaac Lab, Genesis, Newton, Drake, ROS 2, and modern Gazebo when the task moves from learning exercise to working system.
A usable environment wrapper records observation and action spaces, reset seeds, termination and truncation rules, wrappers, render modes, and info dictionaries. For multi-agent work, PettingZoo also needs the agent order, parallel versus AEC API choice, and per-agent reward semantics.
Hands-On Lab: Build the Chapter System
Objective
Turn the chapter concept into a small working artifact: define the interface, run a baseline, inspect failure modes, then replace the hand-built part with a library shortcut.
Steps
- Define observations, actions, state, and evaluation metrics.
- Implement the smallest useful version from scratch.
- Run the maintained library version and compare behavior.
- Log success, failure, latency, and robustness.
- Write a short postmortem explaining what changed between the simple version and the practical version.
What's Next?
Continue with Section 10.1: Gym is dead; Gymnasium is the standard, where the chapter moves from motivation to the first concrete idea.
This chapter is written for readers who want theory and a working build path in the same pass. Read each section twice: first for the mechanism, then for the artifact you would save if you had to reproduce the result six months later.
| Tool or Library | Where It Pays Off |
|---|---|
| Gymnasium | Use for a concrete lab, comparison, or extension in this chapter. |
| PettingZoo | Use for a concrete lab, comparison, or extension in this chapter. |
| ROS 2 | Use for a concrete lab, comparison, or extension in this chapter. |
| MuJoCo | Use for a concrete lab, comparison, or extension in this chapter. |
| LeRobot | Use for a concrete lab, comparison, or extension in this chapter. |
Extend the lab by adding one baseline, one maintained-library implementation, and one perturbation test. Save the result as a single folder containing configuration, logs, summary metrics, and two representative failure cases.
The chapter can be used as a self-contained reading unit or as the basis for an undergraduate or graduate teaching week. The recommended pattern is concept, minimal implementation, library shortcut, diagnostic exercise, then reflection on failure modes. This keeps the mathematical idea attached to a concrete system artifact rather than letting it float as notation.
For Environments with Gymnasium and PettingZoo, the practical stack should be introduced as a set of choices rather than a shopping list. The relevant tools include Gymnasium, PettingZoo, ROS 2, MuJoCo, LeRobot. Each tool earns its place only when it shortens a working path, improves reproducibility, or exposes a standard interface that students will meet in real embodied systems.
Before leaving the chapter, the reader should be able to state one theory claim, one implementation claim, one evaluation claim, and one realistic failure mode. If any of those four are missing, the chapter should be revisited through the lab.
A strong chapter session ends with an artifact: a small script, a plotted trace, a simulator run, a data card, or a reproducible evaluation panel. The artifact is what turns reading into embodied-system-building practice.
Bibliography & Further Reading
Foundational Papers, Tools, and References
Sutton, R. S., and Barto, A. G.. "Reinforcement Learning: An Introduction." (2018). http://incompleteideas.net/book/the-book-2nd.html
A foundation for value functions, policy gradients, exploration, and the RL framing used throughout the book.
Todorov, E., Erez, T., and Tassa, Y.. "MuJoCo: A physics engine for model-based control." (2012). https://mujoco.org/
The simulator lineage behind much modern robot learning, now extended through MJX and Warp workflows.
Brohan, A. et al.. "RT-1: Robotics Transformer for real-world control at scale." (2022). https://arxiv.org/abs/2212.06817
A landmark in large-scale robot policy learning with transformer policies.
Brohan, A. et al.. "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." (2023). https://arxiv.org/abs/2307.15818
A central reference for connecting web-scale VLM knowledge to robot actions.
Open X-Embodiment Collaboration. "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." (2023). https://arxiv.org/abs/2310.08864
The cross-embodiment data and transfer reference used by the data chapters.
Chi, C. et al.. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." (2023). https://arxiv.org/abs/2303.04137
The practical diffusion policy reference for imitation learning and continuous action generation.
Hafner, D. et al.. "Mastering Diverse Domains through World Models." (2023). https://arxiv.org/abs/2301.04104
DreamerV3, a modern reference for latent world models and imagination-based control.
Hugging Face. "LeRobot." (2024). https://github.com/huggingface/lerobot
The open robot-learning stack used for datasets, policies, demos, and low-cost embodied AI workflows.
Official documentation and source repositories for Environments with Gymnasium (and PettingZoo).
Use official docs to check install commands, current APIs, and version caveats before applying Environments with Gymnasium (and PettingZoo) in a lab or project.