PPO-Pendulum-v1 Open Source Reinforcement Learning Model - Freely Solve the Control Problems in Pendulum-v1 Environment

Ppo Pendulum V1

Developed by sb3

This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve control problems in the Pendulum-v1 environment.

Physics Model #Inverted Pendulum Control #Continuous Action Space #Random Policy Optimization

Downloads 51

Release Time : 5/4/2022

Model Overview

The model is trained using the PPO algorithm from the Stable Baselines3 library, suitable for the Pendulum-v1 environment, and capable of learning how to control the inverted pendulum to maintain an upright position.

Model Features

Using SDE Technology

Utilizes State-Dependent Exploration (SDE) technology to improve exploration efficiency.

Stable Training

Based on the PPO algorithm, ensuring training stability.

Efficient Learning

Achieves efficient learning through reasonable hyperparameter settings.

Model Capabilities

Inverted Pendulum Control

Continuous Action Space Handling

Reinforcement Learning Task Solving

Use Cases

Control Problems

Inverted Pendulum Balance Control

Control the inverted pendulum to maintain an upright position.

Average reward reaches -230.42 ±142.54

Teaching Demonstrations

Reinforcement Learning Teaching Example

Serves as a teaching demonstration case for reinforcement learning algorithms.

🚀 Stable-Baselines3 PPO Agent for Pendulum-v1

This project presents a trained PPO agent for the Pendulum-v1 environment. It leverages the stable-baselines3 library and the RL Zoo to achieve effective reinforcement learning. The RL Zoo serves as a training framework for Stable Baselines3 agents, offering hyperparameter optimization and pre-trained agents.

🚀 Quick Start

Usage (with SB3 RL Zoo)

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo
SB3: https://github.com/DLR-RM/stable-baselines3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo ppo --env Pendulum-v1 -orga sb3 -f logs/
python enjoy.py --algo ppo --env Pendulum-v1  -f logs/

Training (with the RL Zoo)

python train.py --algo ppo --env Pendulum-v1 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo ppo --env Pendulum-v1 -f logs/ -orga sb3

💻 Usage Examples

Basic Usage

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

# Create the environment
env_id = "Pendulum-v1"
env = make_vec_env(env_id, n_envs=1)

# Instantiate the agent
model = PPO(
    "MlpPolicy",
    env,
    gamma=0.98,
    # Using https://proceedings.mlr.press/v164/raffin22a.html
    use_sde=True,
    sde_sample_freq=4,
    learning_rate=1e-3,
    verbose=1,
)

# Train the agent
model.learn(total_timesteps=int(1e5))

🔧 Technical Details

Hyperparameters

OrderedDict([('clip_range', 0.2),
             ('ent_coef', 0.0),
             ('gae_lambda', 0.95),
             ('gamma', 0.9),
             ('learning_rate', 0.001),
             ('n_envs', 4),
             ('n_epochs', 10),
             ('n_steps', 1024),
             ('n_timesteps', 100000.0),
             ('policy', 'MlpPolicy'),
             ('sde_sample_freq', 4),
             ('use_sde', True),
             ('normalize', False)])

📚 Documentation

Model Performance

Property	Details
Model Type	PPO
Mean Reward	-230.42 +/- 142.54
Task	Reinforcement Learning
Dataset	Pendulum-v1

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご