Section 17.3: Isaac Lab with SKRL / rl_games / RSL-RL | Building Embodied AI: From Perception to Autonomous Action

A Careful Control Loop

Technical illustration with one Isaac Lab robot task feeding three labeled training desks for SKRL, rl_games, and RSL-RL, illustrating wrappers that convert the same environment into runner-specific interfaces. — **Figure 17.3A**: Isaac Lab is most useful when the task definition stays stable while wrappers adapt observations, actions, and buffers to the learner you choose.

Big Picture

Isaac Lab with SKRL, rl_games, and RSL-RL is a workflow pattern, not three unrelated integrations. Isaac Lab owns the robot scene, task randomization, observations, actions, rewards, and termination logic; the RL runner owns rollout storage, PPO updates, logging, and checkpointing.

For Isaac Lab with SKRL / rl_games / RSL-RL, GPU RL depends on simulator fidelity, PPO rollout semantics, reward terms, and reset distribution being versioned in the same training artifact.

This section develops the contract between an Isaac Lab environment and a training runner. The contract is concrete: observation groups, action clipping, reward terms, reset buffers, device, runner config, and evaluation script.

The key question is practical: if you switch from RSL-RL to rl_games or SKRL, which parts of the experiment are allowed to change, and which parts must remain identical for the comparison to mean anything?

The Wrapper Is Part Of The Experiment

An Isaac Lab wrapper is not neutral glue. It decides how observations are packed, whether actions are clipped, where tensors live, and how privileged states reach an asymmetric critic.

Theory

Isaac Lab tasks commonly expose named observation groups. A policy actor may receive proprioception, commands, and history, while the critic may receive privileged simulator state such as terrain height, contact flags, or object poses. In asymmetric actor-critic training, those groups must be routed explicitly so privileged information helps value learning without leaking into the deployed actor.

The runner boundary also controls performance. rl_games can work directly with GPU buffers, RSL-RL expects its own rollout storage conventions, and SKRL emphasizes transparent algorithm configuration across backends. The task is the scientific object; the runner is the optimizer and storage implementation.

Mechanism

The mechanism is: create the Isaac Lab task, wrap it for the runner, map observation groups into the runner's expected input format, collect rollouts, update the policy, then evaluate the exported checkpoint through the same task contract. The dangerous step is silent conversion, especially when a wrapper changes clipping, device movement, or the meaning of obs and states.

Worked Example

Code Fragment 17.3.1 makes the runner choice explicit. It is a small manifest, but it captures the fields that are often hidden in launcher commands and YAML files.

# Compare Isaac Lab runner wrappers by the contract they must preserve.
# The task stays fixed while storage, observation packing, and logging vary.
from dataclasses import dataclass

@dataclass
class RunnerContract:
    runner: str
    wrapper_role: str
    actor_input: str
    critic_input: str

    def as_row(self) -> dict[str, object]:
        return asdict(self)

contracts = [
    RunnerContract("RSL-RL", "rollout storage for locomotion PPO", "obs", "privileged_obs"),
    RunnerContract("rl_games", "GPU buffer bridge and clipping", "obs", "states"),
    RunnerContract("SKRL", "readable algorithm and memory config", "states", "state_values"),
]

for contract in contracts:
    print(f"{contract.runner}: actor={contract.actor_input}, critic={contract.critic_input}")

RSL-RL: actor=obs, critic=privileged_obs rl_games: actor=obs, critic=states SKRL: actor=states, critic=state_values

Code Fragment 17.3.1 shows why runner wrappers must be documented, not treated as interchangeable. The actor and critic keys differ across conventions, which matters when privileged simulator state is allowed for value learning but not for deployment.

Expected output: the trace should reveal which tensor group reaches the actor and which reaches the critic. If the manifest cannot answer that question, an asymmetric training result is not reproducible.

Library Shortcut

Isaac Lab provides runner scripts and wrappers for RL libraries, including rl_games, RSL-RL, SKRL, and Stable-Baselines3. The shortcut is valuable because it reuses task definitions while adapting data formats, but the experiment should still record the wrapper, runner version, and observation-group mapping.

Practical Recipe

Define the Isaac Lab task first: robot asset, scene, observations, actions, rewards, terminations, curriculum, and randomization.
Choose the runner based on the experiment goal: speed, readable algorithm research, recurrent policies, or compatibility with existing locomotion configs.
Record actor observation groups and critic-only privileged groups before training.
Keep train and play scripts separate so evaluation uses deterministic actions and held-out seeds.
Export the checkpoint, normalization statistics, runner config, task config, and exact Isaac Lab commit or release together.

Common Failure Mode

The common mistake is to switch runners and also change reward weights, observation groups, normalization, action scaling, or evaluation seeds. That comparison measures a new experiment bundle, not the runner.

Practical Example

A robotics team comparing RSL-RL and SKRL on the same Isaac Lab task should keep terrain seeds, reward terms, action scale, command distribution, and evaluation script identical. The result artifact should include both runner configs plus one shared evaluation table.

Memory Hook

The wrapper is the adapter plug on the robot-learning workbench. Label it, or the next debugger will spend an afternoon asking why the critic knew the terrain and the actor did not.

Research Frontier

The Isaac Lab frontier is moving toward richer sensor tasks, kit-less workflows, multi-backend physics, reusable task registries, and cleaner policy export. The research challenge is to keep these workflows modular without making the wrapper layer a hidden source of experimental variation.

Self Check

Can you name the Isaac Lab task, runner, wrapper, actor observation keys, critic-only keys, action clipping rule, normalization file, and evaluation script? If not, the runner result is not yet portable.

The idea in this section becomes useful when the runner boundary is explicit. Isaac Lab gives you a task graph; the runner gives you a learner. The wrapper is where the two meet, so it must be part of the experiment record.

The graduate-level habit is to separate task validity from runner performance. A runner can update faster without improving the task definition, and a better task curriculum can improve every runner. A fair comparison changes one of those layers at a time.

Practical Tool Choices For This Section

Tool or Library	Role in the Topic	Builder Advice
Isaac Lab task config	Robot, scene, rewards, resets, and observations	Treat it as the fixed task contract when comparing learners.
RSL-RL wrapper	Locomotion-oriented rollout and PPO storage	Use it for fast legged locomotion baselines with common privileged-observation patterns.
rl_games wrapper	GPU buffer conversion, clipping, and runner format	Use it when direct GPU buffer handling and mature PPO configs are the priority.
SKRL wrapper	Modular memory and algorithm interface	Use it when algorithm readability and multi-backend experimentation matter.
Play or evaluation script	Deterministic held-out rollout	Use it as the single source for success, fall, and command-tracking metrics.

A robust implementation starts with a manifest that binds task, wrapper, runner, and evaluation. The manifest is small enough to review in a pull request and concrete enough to reproduce a run later.

Record the task entry point and runner wrapper in the same config artifact.
List actor observations and critic-only observations separately.
Store action clipping, observation clipping, and normalization settings.
Save train and play commands with explicit seed panels.
Compare runners only through the same play script and metric exporter.

# Record the Isaac Lab task, wrapper, and evaluation contract together.
# This prevents runner comparisons from hiding observation or seed changes.
from dataclasses import dataclass, asdict

@dataclass
class IsaacLabRunManifest:
    task: str
    runner: str
    wrapper: str
    actor_obs: tuple[str, ...]
    critic_obs: tuple[str, ...]
    eval_panel: str

    def as_row(self) -> dict[str, object]:
        return asdict(self)

manifest = IsaacLabRunManifest(
    task="Isaac-Velocity-Rough-Anymal-D-v0",
    runner="rsl_rl",
    wrapper="RslRlVecEnvWrapper",
    actor_obs=("proprioception", "commands", "history"),
    critic_obs=("terrain_heights", "contact_flags", "base_velocity"),
    eval_panel="rough_terrain_holdout_256",
)
print(manifest.as_row())

{'task': 'Isaac-Velocity-Rough-Anymal-D-v0', 'runner': 'rsl_rl', 'wrapper': 'RslRlVecEnvWrapper', 'actor_obs': ('proprioception', 'commands', 'history'), 'critic_obs': ('terrain_heights', 'contact_flags', 'base_velocity'), 'eval_panel': 'rough_terrain_holdout_256'}

Code Fragment 17.3.2 records the wrapper contract that makes an Isaac Lab run reproducible. The actor and critic observation tuples expose whether privileged simulator information is confined to training-time value estimation.

When an Isaac Lab run fails, inspect the wrapper contract before changing rewards. Common faults include critic-only state leaking into actor inputs, train-time randomization missing from evaluation, action clipping differing across runners, and normalization files not loaded during play.

Evaluation Recipe

For Isaac Lab runner comparisons, compare only construct-matched metrics that are co-computed in one pass on one configuration: same task config, same reward terms, same randomization panel, same evaluation seeds, same checkpoint selection rule, and the same success definition. Save runner config, wrapper name, observation groups, normalization state, logs, videos, and metrics as one artifact.

Key Takeaway

Isaac Lab runners are interchangeable only after the wrapper contract is explicit. Reproducible comparisons keep the task fixed, document observation routing, and evaluate every checkpoint through the same held-out play script.

Exercise 17.3.1

Write a manifest for one Isaac Lab locomotion task trained with two runners. Specify actor observations, critic-only observations, action clipping, train seeds, evaluation seeds, and the single play script used to compute both result rows.

What's Next?

This section turned Isaac Lab runner choice into a wrapper contract: task config, observation routing, device behavior, normalization, and held-out evaluation must be visible. Next, continue with Section 17.4, where the same contract is expressed in MJX, Brax, and JAX-native RL loops.

References & Further Reading

Foundational Papers, Tools, and Practice References

Makoviychuk, V. et al. (2021). Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. arXiv.

Isaac Gym explains the lineage behind Isaac Lab's GPU-resident training pattern. It is useful here for understanding why runner wrappers must preserve device placement and rollout-buffer semantics.

Paper

Freeman, C. D. et al. (2021). Brax: A Differentiable Physics Engine for Large Scale Rigid Body Simulation. arXiv.

Brax offers a contrasting design where simulator and learner are already JAX-native. Reading it beside Isaac Lab clarifies which responsibilities belong to the simulator stack and which belong to the runner.

Paper

NVIDIA Isaac Lab documentation.

Isaac Lab is the primary reference for this section. Its RL wrapper and script documentation show how SKRL, rl_games, RSL-RL, and Stable-Baselines3 receive task data through runner-specific interfaces.

Tool

Google DeepMind MuJoCo MJX documentation.

MJX is not an Isaac Lab runner, but it helps readers compare wrapper-heavy integration with a more JAX-native simulator interface. The contrast sharpens the section's focus on boundaries.

Tool

Rudin, N. et al. (2022). Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. CoRL.

Rudin et al. motivate why the runner layer matters for locomotion. The paper's training pattern is the kind of workload that RSL-RL and related Isaac Lab runners are meant to operationalize.

Paper

RSL-RL repository.

RSL-RL is the runner readers should inspect for locomotion-oriented PPO storage, normalization, and checkpoint conventions. It anchors the section's warning that the wrapper contract is part of the experiment.

Tool