article
RLlib training loop
Know what changes during an RL experiment.
RLlib training loop
RLlib provides distributed reinforcement learning algorithms and abstractions for environments, policies, learners, and evaluation. The important first step is knowing which part of the system changes during an experiment.
Configure the algorithm
from ray.rllib.algorithms.ppo import PPOConfig
config = (
PPOConfig()
.environment("CartPole-v1")
.training(lr=3e-4)
.env_runners(num_env_runners=4)
)
algo = config.build()
for _ in range(10):
result = algo.train()
print(result["env_runners"]["episode_return_mean"])
Evaluation discipline
Training rewards alone can be misleading. Keep a separate evaluation setup and report both learning progress and stable policy performance.
Team checklist
- Define the observation and action spaces.
- Version the environment.
- Track reward shaping changes.
- Save checkpoints for policies that pass evaluation.
1
RLlib training loop
RLlib loop