A Careful Control Loop
Isaac Lab with SKRL, rl_games, and RSL-RL is a workflow pattern, not three unrelated integrations. Isaac Lab owns the robot scene, task randomization, observations, actions, rewards, and termination logic; the RL runner owns rollout storage, PPO updates, logging, and checkpointing.
For Isaac Lab with SKRL / rl_games / RSL-RL, GPU RL depends on simulator fidelity, PPO rollout semantics, reward terms, and reset distribution being versioned in the same training artifact.
This section develops the contract between an Isaac Lab environment and a training runner. The contract is concrete: observation groups, action clipping, reward terms, reset buffers, device, runner config, and evaluation script.
The key question is practical: if you switch from RSL-RL to rl_games or SKRL, which parts of the experiment are allowed to change, and which parts must remain identical for the comparison to mean anything?
An Isaac Lab wrapper is not neutral glue. It decides how observations are packed, whether actions are clipped, where tensors live, and how privileged states reach an asymmetric critic.
Theory
Isaac Lab tasks commonly expose named observation groups. A policy actor may receive proprioception, commands, and history, while the critic may receive privileged simulator state such as terrain height, contact flags, or object poses. In asymmetric actor-critic training, those groups must be routed explicitly so privileged information helps value learning without leaking into the deployed actor.
The runner boundary also controls performance. rl_games can work directly with GPU buffers, RSL-RL expects its own rollout storage conventions, and SKRL emphasizes transparent algorithm configuration across backends. The task is the scientific object; the runner is the optimizer and storage implementation.
The mechanism is: create the Isaac Lab task, wrap it for the runner, map observation groups into the runner's expected input format, collect rollouts, update the policy, then evaluate the exported checkpoint through the same task contract. The dangerous step is silent conversion, especially when a wrapper changes clipping, device movement, or the meaning of obs and states.
Worked Example
Code Fragment 17.3.1 makes the runner choice explicit. It is a small manifest, but it captures the fields that are often hidden in launcher commands and YAML files.
# Compare Isaac Lab runner wrappers by the contract they must preserve.
# The task stays fixed while storage, observation packing, and logging vary.
from dataclasses import dataclass
@dataclass
class RunnerContract:
runner: str
wrapper_role: str
actor_input: str
critic_input: str
def as_row(self) -> dict[str, object]:
return asdict(self)
contracts = [
RunnerContract("RSL-RL", "rollout storage for locomotion PPO", "obs", "privileged_obs"),
RunnerContract("rl_games", "GPU buffer bridge and clipping", "obs", "states"),
RunnerContract("SKRL", "readable algorithm and memory config", "states", "state_values"),
]
for contract in contracts:
print(f"{contract.runner}: actor={contract.actor_input}, critic={contract.critic_input}")
Expected output: the trace should reveal which tensor group reaches the actor and which reaches the critic. If the manifest cannot answer that question, an asymmetric training result is not reproducible.
Isaac Lab provides runner scripts and wrappers for RL libraries, including rl_games, RSL-RL, SKRL, and Stable-Baselines3. The shortcut is valuable because it reuses task definitions while adapting data formats, but the experiment should still record the wrapper, runner version, and observation-group mapping.
Practical Recipe
- Define the Isaac Lab task first: robot asset, scene, observations, actions, rewards, terminations, curriculum, and randomization.
- Choose the runner based on the experiment goal: speed, readable algorithm research, recurrent policies, or compatibility with existing locomotion configs.
- Record actor observation groups and critic-only privileged groups before training.
- Keep train and play scripts separate so evaluation uses deterministic actions and held-out seeds.
- Export the checkpoint, normalization statistics, runner config, task config, and exact Isaac Lab commit or release together.
The common mistake is to switch runners and also change reward weights, observation groups, normalization, action scaling, or evaluation seeds. That comparison measures a new experiment bundle, not the runner.
A robotics team comparing RSL-RL and SKRL on the same Isaac Lab task should keep terrain seeds, reward terms, action scale, command distribution, and evaluation script identical. The result artifact should include both runner configs plus one shared evaluation table.
The wrapper is the adapter plug on the robot-learning workbench. Label it, or the next debugger will spend an afternoon asking why the critic knew the terrain and the actor did not.
The Isaac Lab frontier is moving toward richer sensor tasks, kit-less workflows, multi-backend physics, reusable task registries, and cleaner policy export. The research challenge is to keep these workflows modular without making the wrapper layer a hidden source of experimental variation.
Can you name the Isaac Lab task, runner, wrapper, actor observation keys, critic-only keys, action clipping rule, normalization file, and evaluation script? If not, the runner result is not yet portable.
The idea in this section becomes useful when the runner boundary is explicit. Isaac Lab gives you a task graph; the runner gives you a learner. The wrapper is where the two meet, so it must be part of the experiment record.
The graduate-level habit is to separate task validity from runner performance. A runner can update faster without improving the task definition, and a better task curriculum can improve every runner. A fair comparison changes one of those layers at a time.
| Tool or Library | Role in the Topic | Builder Advice |
|---|---|---|
| Isaac Lab task config | Robot, scene, rewards, resets, and observations | Treat it as the fixed task contract when comparing learners. |
| RSL-RL wrapper | Locomotion-oriented rollout and PPO storage | Use it for fast legged locomotion baselines with common privileged-observation patterns. |
| rl_games wrapper | GPU buffer conversion, clipping, and runner format | Use it when direct GPU buffer handling and mature PPO configs are the priority. |
| SKRL wrapper | Modular memory and algorithm interface | Use it when algorithm readability and multi-backend experimentation matter. |
| Play or evaluation script | Deterministic held-out rollout | Use it as the single source for success, fall, and command-tracking metrics. |
A robust implementation starts with a manifest that binds task, wrapper, runner, and evaluation. The manifest is small enough to review in a pull request and concrete enough to reproduce a run later.
- Record the task entry point and runner wrapper in the same config artifact.
- List actor observations and critic-only observations separately.
- Store action clipping, observation clipping, and normalization settings.
- Save train and play commands with explicit seed panels.
- Compare runners only through the same play script and metric exporter.
# Record the Isaac Lab task, wrapper, and evaluation contract together.
# This prevents runner comparisons from hiding observation or seed changes.
from dataclasses import dataclass, asdict
@dataclass
class IsaacLabRunManifest:
task: str
runner: str
wrapper: str
actor_obs: tuple[str, ...]
critic_obs: tuple[str, ...]
eval_panel: str
def as_row(self) -> dict[str, object]:
return asdict(self)
manifest = IsaacLabRunManifest(
task="Isaac-Velocity-Rough-Anymal-D-v0",
runner="rsl_rl",
wrapper="RslRlVecEnvWrapper",
actor_obs=("proprioception", "commands", "history"),
critic_obs=("terrain_heights", "contact_flags", "base_velocity"),
eval_panel="rough_terrain_holdout_256",
)
print(manifest.as_row())
When an Isaac Lab run fails, inspect the wrapper contract before changing rewards. Common faults include critic-only state leaking into actor inputs, train-time randomization missing from evaluation, action clipping differing across runners, and normalization files not loaded during play.
For Isaac Lab runner comparisons, compare only construct-matched metrics that are co-computed in one pass on one configuration: same task config, same reward terms, same randomization panel, same evaluation seeds, same checkpoint selection rule, and the same success definition. Save runner config, wrapper name, observation groups, normalization state, logs, videos, and metrics as one artifact.
Isaac Lab runners are interchangeable only after the wrapper contract is explicit. Reproducible comparisons keep the task fixed, document observation routing, and evaluate every checkpoint through the same held-out play script.
Write a manifest for one Isaac Lab locomotion task trained with two runners. Specify actor observations, critic-only observations, action clipping, train seeds, evaluation seeds, and the single play script used to compute both result rows.
What's Next?
This section turned Isaac Lab runner choice into a wrapper contract: task config, observation routing, device behavior, normalization, and held-out evaluation must be visible. Next, continue with Section 17.4, where the same contract is expressed in MJX, Brax, and JAX-native RL loops.
Isaac Gym explains the lineage behind Isaac Lab's GPU-resident training pattern. It is useful here for understanding why runner wrappers must preserve device placement and rollout-buffer semantics.
Brax offers a contrasting design where simulator and learner are already JAX-native. Reading it beside Isaac Lab clarifies which responsibilities belong to the simulator stack and which belong to the runner.
NVIDIA Isaac Lab documentation.
Isaac Lab is the primary reference for this section. Its RL wrapper and script documentation show how SKRL, rl_games, RSL-RL, and Stable-Baselines3 receive task data through runner-specific interfaces.
Google DeepMind MuJoCo MJX documentation.
MJX is not an Isaac Lab runner, but it helps readers compare wrapper-heavy integration with a more JAX-native simulator interface. The contrast sharpens the section's focus on boundaries.
Rudin et al. motivate why the runner layer matters for locomotion. The paper's training pattern is the kind of workload that RSL-RL and related Isaac Lab runners are meant to operationalize.
RSL-RL is the runner readers should inspect for locomotion-oriented PPO storage, normalization, and checkpoint conventions. It anchors the section's warning that the wrapper contract is part of the experiment.