PPO-LunarLander-v2 Open-Source Reinforcement Learning Model - Free Deployment to Solve Lunar Landing Task

Ppo LunarLander V2

Developed by araffin

This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.

Physics Model #Lunar Landing Control #Multi-environment Parallel Training #Reinforcement Learning Tuning

Downloads 65

Release Time : 5/4/2022

Model Overview

The model is trained using the PPO algorithm from the stable-baselines3 library and can achieve stable landing control in the LunarLander-v2 environment.

Model Features

High-performance Landing Control

Achieves stable landing control in the LunarLander-v2 environment with an average reward of 283.49.

Based on PPO Algorithm

Uses the Proximal Policy Optimization algorithm, an advanced policy gradient method with good sample efficiency and stability.

Multi-environment Parallel Training

Supports parallel training across multiple environments to accelerate the training process.

Model Capabilities

Reinforcement Learning Control

Continuous Action Space Handling

Environment Interaction Learning

Use Cases

Game AI

Lunar Landing Game AI

Can serve as an AI controller for lunar landing games

Capable of stably controlling the lander for safe landing

Educational Demonstration

Reinforcement Learning Teaching Case

Used to demonstrate practical applications of reinforcement learning algorithms

Visually showcases the learning process of the PPO algorithm

🚀 PPO Agent for LunarLander-v2

This is a trained PPO agent for the LunarLander-v2 environment, leveraging the stable-baselines3 library to achieve high performance in reinforcement learning tasks.

Property	Details
Model Type	PPO
Training Data	LunarLander-v2

🚀 Quick Start

This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.

💻 Usage Examples

Basic Usage

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy

# Download checkpoint
checkpoint = load_from_hub("araffin/ppo-LunarLander-v2", "ppo-LunarLander-v2.zip")
# Load the model
model = PPO.load(checkpoint)

env = make_vec_env("LunarLander-v2", n_envs=1)

# Evaluate
print("Evaluating model")
mean_reward, std_reward = evaluate_policy(
    model,
    env,
    n_eval_episodes=20,
    deterministic=True,
)
print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}")

# Start a new episode
obs = env.reset()

try:
    while True:
        action, _states = model.predict(obs, deterministic=True)
        obs, rewards, dones, info = env.step(action)
        env.render()
except KeyboardInterrupt:
    pass

Advanced Usage

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.callbacks import EvalCallback

# Create the environment
env_id = "LunarLander-v2"
n_envs = 16
env = make_vec_env(env_id, n_envs=n_envs)

# Create the evaluation envs
eval_envs = make_vec_env(env_id, n_envs=5)

# Adjust evaluation interval depending on the number of envs
eval_freq = int(1e5)
eval_freq = max(eval_freq // n_envs, 1)

# Create evaluation callback to save best model
# and monitor agent performance
eval_callback = EvalCallback(
    eval_envs,
    best_model_save_path="./logs/",
    eval_freq=eval_freq,
    n_eval_episodes=10,
)

# Instantiate the agent
# Hyperparameters from https://github.com/DLR-RM/rl-baselines3-zoo
model = PPO(
    "MlpPolicy",
    env,
    n_steps=1024,
    batch_size=64,
    gae_lambda=0.98,
    gamma=0.999,
    n_epochs=4,
    ent_coef=0.01,
    verbose=1,
)

# Train the agent (you can kill it before using ctrl+c)
try:
    model.learn(total_timesteps=int(5e6), callback=eval_callback)
except KeyboardInterrupt:
    pass

# Load best model
model = PPO.load("logs/best_model.zip")

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご