PPO-CartPole-v1 Open-Source Reinforcement Learning Model - Solve the Balance Problem with Free Deployment!

Ppo CartPole V1

Developed by sb3

This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the balancing problem in the CartPole-v1 environment.

Molecular Model #Reinforcement Learning Control #CartPole Balancing #PPO Algorithm

Downloads 449

Release Time : 5/19/2022

Model Overview

The model is trained using the Proximal Policy Optimization (PPO) algorithm and can stably maintain pole balance in the CartPole-v1 environment, achieving the maximum reward of 500 points.

Model Features

High-performance PPO Algorithm

Utilizes the PPO algorithm for stable training and efficient learning

Multi-environment Parallel Training

Supports parallel training across 8 environments to improve training efficiency

Optimized Hyperparameters

Uses optimized hyperparameter configurations to ensure peak performance

Model Capabilities

CartPole Balancing Control

Reinforcement Learning Task Solving

Real-time Decision Making

Use Cases

Educational Demonstration

Reinforcement Learning Teaching Example

Serves as a classic case for introductory reinforcement learning education

Helps students understand the basic principles of reinforcement learning

Algorithm Research

PPO Algorithm Performance Research

Used to study the performance of the PPO algorithm in different environments

Provides benchmark performance references

🚀 Stable-Baselines3 PPO Agent for CartPole-v1

This project presents a trained PPO agent for the CartPole-v1 environment. It leverages the stable-baselines3 library and the RL Zoo to achieve high performance in reinforcement learning tasks.

Model Information

Property	Details
Model Type	PPO
Training Data	CartPole-v1
Mean Reward	500.00 +/- 0.00

🚀 Quick Start

Prerequisites

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo
SB3: https://github.com/DLR-RM/stable-baselines3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Installation

Install the RL Zoo (with SB3 and SB3-Contrib):

pip install rl_zoo3

Usage (with SB3 RL Zoo)

# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo ppo --env CartPole-v1 -orga sb3 -f logs/
python -m rl_zoo3.enjoy --algo ppo --env CartPole-v1  -f logs/

If you installed the RL Zoo3 via pip (pip install rl_zoo3), from anywhere you can do:

python -m rl_zoo3.load_from_hub --algo ppo --env CartPole-v1 -orga sb3 -f logs/
python -m rl_zoo3.enjoy --algo ppo --env CartPole-v1  -f logs/

Training (with the RL Zoo)

python -m rl_zoo3.train --algo ppo --env CartPole-v1 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo ppo --env CartPole-v1 -f logs/ -orga sb3

💻 Usage Examples

Basic Usage

# Download and enjoy the pre-trained model
python -m rl_zoo3.load_from_hub --algo ppo --env CartPole-v1 -orga sb3 -f logs/
python -m rl_zoo3.enjoy --algo ppo --env CartPole-v1  -f logs/

Advanced Usage

# Train a new model from scratch and upload it
python -m rl_zoo3.train --algo ppo --env CartPole-v1 -f logs/
python -m rl_zoo3.push_to_hub --algo ppo --env CartPole-v1 -f logs/ -orga sb3

🔧 Technical Details

Hyperparameters

OrderedDict([('batch_size', 256),
             ('clip_range', 'lin_0.2'),
             ('ent_coef', 0.0),
             ('gae_lambda', 0.8),
             ('gamma', 0.98),
             ('learning_rate', 'lin_0.001'),
             ('n_envs', 8),
             ('n_epochs', 20),
             ('n_steps', 32),
             ('n_timesteps', 100000.0),
             ('policy', 'MlpPolicy'),
             ('normalize', False)])

Environment Arguments

{'render_mode': 'rgb_array'}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご