PPO-Hopper-v3 Open Source Reinforcement Learning Model - Free Deployment to Solve the Continuous Control Problem in Hopper-v3 Environment

Ppo Hopper V3

Developed by sb3

This is a PPO reinforcement learning model trained based on the stable-baselines3 library, specifically designed for continuous control tasks in the Hopper-v3 environment.

Physics Model #Reinforcement Learning Control #Robot Locomotion #High-Precision Policy Optimization

Downloads 19

Release Time : 6/2/2022

Model Overview

This model is trained using the Proximal Policy Optimization (PPO) algorithm to solve continuous control problems in the Hopper-v3 environment, enabling the robot to learn hopping movements.

Model Features

High Performance

Achieved an average reward of 2410.11 in the Hopper-v3 environment

Stable Training

Uses the PPO algorithm to ensure training stability

Parameter Optimization

Carefully tuned hyperparameter configuration

Model Capabilities

Continuous Action Space Control

Robot Motion Control

Reinforcement Learning Task Solving

Use Cases

Robot Control

Hopping Robot Control

Control the robot to achieve stable hopping movements

Achieved an average reward of 2410.11 in the Hopper-v3 environment

Reinforcement Learning Research

Algorithm Benchmarking

Serves as a benchmark reference for the PPO algorithm in continuous control tasks

🚀 Stable-Baselines3 PPO Agent for Hopper-v3

This project presents a trained PPO agent designed to play the Hopper-v3 environment. It leverages the stable-baselines3 library and the RL Zoo for reinforcement learning tasks. The RL Zoo serves as a comprehensive training framework for Stable Baselines3 agents, offering hyperparameter optimization and pre - trained agents.

🚀 Quick Start

Usage (with SB3 RL Zoo)

You can use the following links to access the relevant repositories:

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo
SB3: https://github.com/DLR-RM/stable-baselines3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo ppo --env Hopper-v3 -orga sb3 -f logs/
python enjoy.py --algo ppo --env Hopper-v3  -f logs/

Training (with the RL Zoo)

python train.py --algo ppo --env Hopper-v3 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo ppo --env Hopper-v3 -f logs/ -orga sb3

✨ Features

The PPO agent in this project has been trained to achieve a mean reward of 2410.11 +/- 9.86 on the Hopper-v3 environment. The following table provides more details about the model:

Property	Details
Model Type	PPO
Training Data	Hopper-v3

💻 Usage Examples

Basic Usage

The following commands show how to download the pre - trained model and run it:

# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo ppo --env Hopper-v3 -orga sb3 -f logs/
python enjoy.py --algo ppo --env Hopper-v3  -f logs/

Advanced Usage

If you want to train the model from scratch and upload it to the hub, you can use the following commands:

python train.py --algo ppo --env Hopper-v3 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo ppo --env Hopper-v3 -f logs/ -orga sb3

🔧 Technical Details

Hyperparameters

The hyperparameters used for training the PPO agent are as follows:

OrderedDict([('batch_size', 32),
             ('clip_range', 0.2),
             ('ent_coef', 0.00229519),
             ('gae_lambda', 0.99),
             ('gamma', 0.999),
             ('learning_rate', 9.80828e-05),
             ('max_grad_norm', 0.7),
             ('n_envs', 1),
             ('n_epochs', 5),
             ('n_steps', 512),
             ('n_timesteps', 1000000.0),
             ('normalize', True),
             ('policy', 'MlpPolicy'),
             ('policy_kwargs',
              'dict( log_std_init=-2, ortho_init=False, activation_fn=nn.ReLU, '
              'net_arch=[dict(pi=[256, 256], vf=[256, 256])] )'),
             ('vf_coef', 0.835671),
             ('normalize_kwargs', {'norm_obs': True, 'norm_reward': False})])

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご