Section 37.4: Imagination rollouts | Building Embodied AI: From Perception to Autonomous Action

Imagination helps only while the imagined data still resembles a world the policy will actually visit.
A Budget-Conscious MPC Loop

Real transitions branching into short model-generated rollouts, illustrating how imagination augments learning while staying close to trusted states. — **Figure 37.4A**: Imagination rollouts usually work best when they branch from real states and stay short enough that the model remains locally credible.

Big Picture

Imagination rollouts reuse real data by letting the learner branch short synthetic trajectories from trusted states. The gain is sample efficiency. The danger is that long synthetic rollouts can poison value learning with model fantasy.

Key Insight

Synthetic data is useful only while it stays tethered to states the model understands. Horizon control is what keeps imagination from turning into dataset corruption.

Short Rollouts, Big Consequences

In MBPO-style learning, real states from the replay buffer seed short model-generated rollouts. Those imagined transitions augment policy learning while limiting compounding error. The core trade-off is simple: more imagined data can accelerate learning, but only if rollout length stays inside the model's trusted region.

One useful mental model is

$$ \mathcal{D}_{\text{train}} = \mathcal{D}_{\text{real}} \cup \mathcal{D}_{\text{model}}^{(h)}, $$

where the imagination horizon $h$ is deliberately small. This keeps model-generated states near the support of real experience.

That support argument is the mechanism, not a stylistic preference. The learner is allowed to recycle real states into nearby imagined futures because the model has seen enough neighboring transitions to stay locally coherent. Once synthetic states start seeding further synthetic states, the training set drifts toward parts of state space that were never grounded by real interaction, and value estimates can become systematically optimistic.

Why Branching Helps

Branching from real buffer states is a bias-control trick. It keeps the synthetic rollout close to regions where the model has at least some evidence.

Worked Probe

The probe below logs how many synthetic transitions are produced from a replay batch under different imagination horizons. It shows why horizon choice changes dataset composition so quickly.

# Count imagined transitions produced from one replay batch.
replay_batch = 128
horizons = [1, 3, 5]
imagined = {h: replay_batch * h for h in horizons}
ratio_to_real = {h: round(imagined[h] / replay_batch, 1) for h in horizons}
print({"imagined_transitions": imagined, "ratio_to_real": ratio_to_real})

{'imagined_transitions': {1: 128, 3: 384, 5: 640}, 'ratio_to_real': {1: 1.0, 3: 3.0, 5: 5.0}}

Read the imagined-transitions counts and ratios as a dataset-composition signal: at horizon 1 the synthetic set exactly matches the real batch, but at horizon 5 it is five times larger. That ratio tells you how much weight model-generated data already carries in training before any explicit mixing ratio is set, which is why horizon is not a cosmetic hyperparameter but a direct control on how much model bias enters the learner.

Code Fragment 37.4.1: Even short rollout horizons massively expand the synthetic dataset. The expected lesson is that rollout length is not a cosmetic hyperparameter, it controls how much model bias enters training.

Library Shortcut

When you implement imagination rollouts, log the real-to-model transition ratio, the rollout branching source, and the maximum horizon. These three numbers explain a large fraction of success or failure in practice. mbrl-lib and Dreamer-style codebases are useful references because they make the replay-to-imagination contract visible rather than hiding it inside one giant trainer.

Common Failure Modes

A short-horizon imagination pipeline can still go wrong in three ways. The model may be locally biased around precisely the states that matter for reward improvement. The policy may overfit to synthetic states that look easy under the model but are rarely visited in reality. Or the training loop may silently let synthetic transitions dominate the replay mixture. All three failures create the same surface symptom, a policy that looks data efficient but degrades sharply under real rollouts.

The fix is not to abandon imagination, but to instrument it. Save the source state for each imagined rollout, the horizon used, the ratio of synthetic to real updates, and at least one replayed failure trajectory where the imagined branch misled the learner. That turns a vague trust problem into a concrete evidence trail.

Trust Rule

Seed model rollouts from real states, keep the horizon short, monitor held-out model error, and reduce or stop imagination when calibration deteriorates or synthetic data overwhelms the real buffer.

Warning

Synthetic transitions can quietly dominate training and pull the learner toward impossible states. If your synthetic-to-real ratio climbs without a corresponding held-out model audit, you may be optimizing on fantasy data.

Practical Example

For a tabletop pushing task, two or three imagined steps branched from real states may be enough to accelerate value learning. For long-horizon autonomous driving, naive long synthetic rollouts can easily invent lane states or contact events the real car would never produce.

Cross-References

This section connects directly to the rollout-horizon caution in Section 36.3 and to MBPO in the bibliography below.

Research Frontier

Modern imagination-based agents increasingly mix short synthetic rollouts with strong value models or latent planners. The open research problem is adaptive trust: deciding rollout length from confidence rather than from a fixed schedule.

Self Check

Why is branching from replay-buffer states safer than initializing long synthetic rollouts from synthetic states created by earlier imagination?

Memory Hook

Imagination helps when it stays tethered to reality. Cut the tether, and the learner starts studying its own fiction.

Key Takeaway

Imagination rollouts are valuable because they multiply data use, but only when the rollout horizon is kept inside the model's trusted neighborhood.

Exercise

Design an MBPO-style training loop for a robot task. What states seed imagination, what horizon would you start with, and what metric would trigger shortening the rollout?

Bibliography & Further Reading

Primary References And Tools

Reference Janner, M. et al.. "When to Trust Your Model: Model-Based Policy Optimization." (2019). https://arxiv.org/abs/1906.08253

The essential reference for short trusted imagination rollouts.

Reference Hafner, D. et al.. "Mastering Diverse Domains through World Models." (2023). https://arxiv.org/abs/2301.04104

DreamerV3 is a broad latent imagination baseline worth contrasting with explicit MBPO-style branching.

Reference Hansen, N. et al.. "TD-MPC2: Scalable, Robust World Models for Continuous Control." (2023). https://arxiv.org/abs/2310.16828

Useful for comparing latent short-horizon planning with synthetic-data augmentation approaches.