Long rollouts are persuasive until their first small bias compounds into a fake future.
A Horizon-Aware Predictor
Compounding error is the main reason predictive control trusts short horizons first. Every rollout step consumes the model's own previous output, so small one-step mistakes can grow geometrically or drift into regions with no data support.
Rollout horizon is a trust budget. Each extra imagined step spends more of that budget, and eventually the model starts paying for action with fiction instead of evidence.
Why Rollout Error Grows
If the true dynamics are $f$ and the learned model is $\hat f$, then a one-step state error of size $\varepsilon$ can grow after $H$ rollout steps roughly like
$$ \lVert s_{t+H} - \hat s_{t+H} \rVert \lesssim \sum_{k=0}^{H-1} L^k \varepsilon, $$
where $L$ is an effective sensitivity constant of the dynamics and controller. If $L > 1$, the planner cannot assume that a small local fit implies a good long imagined future.
In practice, three mechanisms usually dominate this blow-up. First, state-estimation noise enters the first prediction and is then recycled as if it were clean state. Second, actuator delay or rate limits make the model optimistic about how quickly the robot can correct itself. Third, contact mode switches, wheel slip, or near-singular arm poses create local dynamics with much larger sensitivity than the average transition error suggests. That is why a model that looks excellent on one-step loss can still rank action sequences badly after only a few imagined steps.
Longer horizon is valuable only until model bias dominates. After that point, more imagination produces less trustworthy control.
Worked Probe
The next probe logs how a small underestimation of acceleration error drifts over five rollout steps. The pattern is simple enough to inspect by eye, which is exactly what a first debugging panel should allow.
# Roll out a biased model and compare cumulative error by horizon.
true_pos = 0.0
true_vel = 1.0
model_pos = 0.0
model_vel = 0.98
dt = 0.1
horizon_error = []
for step in range(1, 6):
true_pos += true_vel * dt
model_pos += model_vel * dt
horizon_error.append(round(true_pos - model_pos, 4))
true_vel *= 0.99
model_vel *= 0.97
print({"horizon_error": horizon_error, "final_error": horizon_error[-1]})
{'horizon_error': [0.002, 0.0049, 0.0086, 0.013, 0.0181], 'final_error': 0.0181}
Read the horizon-error list as a compounding signal: each entry is larger than the last because the model's velocity underestimate accumulates step by step. The final error is roughly nine times the one-step error, which is exactly the drift pattern that makes long-horizon planning unreliable even when the one-step fit looks acceptable.
In production, save horizon-conditioned metrics explicitly. A single scalar MSE hides whether the model is excellent at one to three steps and unusable after ten, which is often the real control-relevant story. In PyTorch or JAX training loops, log per-horizon error tables and sequence-ranking accuracy alongside loss. In mujoco_mpc, Isaac Lab, or Drake rollouts, log whether the same horizon cap preserves closed-loop success under the same seed panel.
Failure Analysis That Actually Helps
When a long-horizon planner fails, the first useful artifact is not another scalar plot. It is an overlay of real and imagined trajectories aligned by time, with the first meaningful divergence marked explicitly. On a manipulator, that marker is often the first contact frame where normal force or slip onset differs. On a drone, it may be the first step where unmodeled drag or attitude lag changes the commanded recovery. A good debugging ledger therefore stores the horizon at which divergence becomes decision-relevant, not just numerically visible.
Tool support matters here. wandb or TensorBoard can store horizon-tagged traces, but the deeper point is methodological: keep the rollout panel fixed while comparing horizon caps, model versions, and controller variants. If the panel changes, the claim about horizon trust stops being defensible.
Fit one-step transitions first, evaluate multi-step rollouts on held-out episodes, then cap planner horizon where task performance still improves. If longer horizons help only on paper, shorten the rollout and rely more on feedback or terminal values.
Do not compare a short-rollout method and a long-rollout method using metrics produced by different seed panels or different reset logic. Horizon claims are extremely sensitive to data support and termination conditions.
A highway-driving planner may want a multi-second horizon in free space, but a manipulator inserting a connector often trusts only a few contact-relevant steps before replanning. The correct horizon is the one that keeps model bias below the action-threshold the robot cares about.
This section reinforces the rollout caution that appears in Section 37.4 on imagination rollouts and complements the stability discussion in Chapter 7.
MBPO is a key modern lesson here: short branched model rollouts often beat naive long hallucinations. Current work on trust-aware model usage and value-guided rollouts continues pushing the idea that rollout horizon should be adaptive, not fixed by taste.
Suppose your model is excellent for two steps and unreliable after eight. What planning strategy would still exploit it, and what evidence would you save to defend that choice?
Rollout horizon is like credit on a shaky map: spend only as much as the map deserves.
The longest imagined future is rarely the best one. Strong model-based systems plan only across horizons the model has earned.
Design a held-out evaluation that reports one-step, three-step, and ten-step rollout error for a robot task of your choice. Which horizon would you trust for planning, and why?
Bibliography & Further Reading
Primary References And Tools
Janner, M. et al.. "When to Trust Your Model: Model-Based Policy Optimization." (2019). https://arxiv.org/abs/1906.08253
The practical reference for short trusted rollouts instead of blindly long model usage.
Chua, K. et al.. "Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models." (2018). https://arxiv.org/abs/1805.12114
PETS is a canonical baseline for short-horizon planning under learned dynamics.
Hansen, N., Wang, X., and Su, H.. "Temporal Difference Learning for Model Predictive Control." (2022). https://arxiv.org/abs/2203.04955
TD-MPC is a strong illustration of combining short-horizon planning with a learned terminal value.