Sealswalker2d-v0 Open-source Reinforcement Learning Agent - Free Deployment to Control Walker2d Robot Walking

Sealswalker2d V0

Developed by ernestumorga

This is a reinforcement learning agent based on the PPO algorithm, specifically trained for the seals/Walker2d-v0 environment to control the walking task of the Walker2d robot.

Physics Model #Bipedal robot control #Deep reinforcement learning #Continuous action space

Downloads 0

Release Time : 5/27/2022

Model Overview

This model is trained using the PPO algorithm in the Stable Baselines3 library and can achieve stable walking control in the seals/Walker2d-v0 environment.

Model Features

Efficient policy optimization

Use the PPO algorithm to achieve stable and efficient policy optimization, suitable for control tasks in continuous action spaces.

Custom network architecture

Adopt a two-layer MLP network structure with 256 nodes per layer, and the activation function is ReLU, which balances expressiveness and training efficiency.

Parameter optimization

A carefully tuned combination of hyperparameters, including key parameters such as learning rate and discount factor.

Model Capabilities

Continuous action space control

Robot motion control

Reinforcement learning policy optimization

Use Cases

Robot control

Bipedal robot walking

Control the bipedal robot to achieve stable walking motion

Average reward 1429.13 +/- 411.75

Reinforcement learning research

Algorithm performance comparison

Use as a baseline model to compare performance with other reinforcement learning algorithms

🚀 PPO Agent for seals/Walker2d-v0

This project presents a trained PPO agent designed to play seals/Walker2d-v0. It leverages the stable-baselines3 library and the RL Zoo, a training framework for Stable Baselines3 reinforcement learning agents. The framework includes hyperparameter optimization and pre-trained agents.

📦 Metadata

Property	Details
Library Name	stable-baselines3
Tags	seals/Walker2d-v0, deep-reinforcement-learning, reinforcement-learning, stable-baselines3
Model Name	PPO
Mean Reward	1429.13 +/- 411.75
Task	reinforcement-learning
Dataset	seals/Walker2d-v0

🚀 Quick Start

💻 Usage (with SB3 RL Zoo)

You can use the following commands to download and run the model:

# Download model and save it into the logs/ folder
python -m utils.load_from_hub --algo ppo --env seals/Walker2d-v0 -orga ernestumorga -f logs/
python enjoy.py --algo ppo --env seals/Walker2d-v0  -f logs/

Here are the relevant repositories:

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo
SB3: https://github.com/DLR-RM/stable-baselines3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

🔧 Training (with the RL Zoo)

To train the model and upload it, use the following commands:

python train.py --algo ppo --env seals/Walker2d-v0 -f logs/
# Upload the model and generate video (when possible)
python -m utils.push_to_hub --algo ppo --env seals/Walker2d-v0 -f logs/ -orga ernestumorga

🔧 Technical Details

Hyperparameters

OrderedDict([('batch_size', 8),
             ('clip_range', 0.4),
             ('ent_coef', 0.00013057334805552262),
             ('gae_lambda', 0.92),
             ('gamma', 0.98),
             ('learning_rate', 3.791707778339674e-05),
             ('max_grad_norm', 0.6),
             ('n_envs', 1),
             ('n_epochs', 5),
             ('n_steps', 2048),
             ('n_timesteps', 1000000.0),
             ('normalize', True),
             ('policy', 'MlpPolicy'),
             ('policy_kwargs',
              'dict(activation_fn=nn.ReLU, net_arch=[dict(pi=[256, 256], '
              'vf=[256, 256])])'),
             ('vf_coef', 0.6167177795726859),
             ('normalize_kwargs', {'norm_obs': True, 'norm_reward': False})])

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご