PPO-PongNoFrameskip-v4 Open-Source Agent - Play the Atari PongNoFrameskip-v4 Game for Free

Ppo PongNoFrameskip V4

Developed by ThomasSimonini

This is a PPO agent trained using the stable-baselines3 library, specifically designed to play the Atari game PongNoFrameskip-v4.

Video Processing #Atari Game Control #Reinforcement Learning Training #Frame Stack Processing

Downloads 148

Release Time : 3/2/2022

Model Overview

The model is trained with the PPO algorithm and can compete as the green side in the PongNoFrameskip-v4 game, achieving an average reward of 21 points.

Model Features

High-performance Game AI

Achieves an excellent performance with an average score of 21 in the PongNoFrameskip-v4 game

Based on Stable Reinforcement Learning Framework

Implemented using the stable-baselines3 library, a widely recognized reinforcement learning framework

Frame Stack Processing

Uses a 4-frame stacking technique to process game screens, enhancing the model's understanding of dynamic environments

Model Capabilities

Atari game PongNoFrameskip-v4 competition

Reinforcement learning environment interaction

Real-time game decision making

Use Cases

Game AI

Pong Game Competition

Acts as an AI player to compete against humans or other AIs in Pong

Average reward of 21 points

Reinforcement Learning Research

Serves as a benchmark model for reinforcement learning algorithm research

🚀 PPO Agent playing PongNoFrameskip-v4

This is a trained model of a PPO agent playing PongNoFrameskip-v4 using the stable-baselines3 library. Our agent is the 🟢 one.

The training report: https://wandb.ai/simoninithomas/HFxSB3/reports/Atari-HFxSB3-Benchmark--VmlldzoxNjI3NTIy

✨ Features

Trained Agent: A PPO agent trained to play PongNoFrameskip-v4.
Evaluation Metrics: Mean reward of 21.00.

📦 Installation

You need to use gym==0.19 since it includes Atari Roms.

💻 Usage Examples

Basic Usage

# Import the libraries
import os 

import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecNormalize

from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

from huggingface_sb3 import load_from_hub, push_to_hub

# Load the model
checkpoint = load_from_hub("ThomasSimonini/ppo-PongNoFrameskip-v4", "ppo-PongNoFrameskip-v4.zip")

# Because we using 3.7 on Colab and this agent was trained with 3.8 to avoid Pickle errors:
custom_objects = {
            "learning_rate": 0.0,
            "lr_schedule": lambda _: 0.0,
            "clip_range": lambda _: 0.0,
        }

model= PPO.load(checkpoint, custom_objects=custom_objects)

env = make_atari_env('PongNoFrameskip-v4', n_envs=1)
env = VecFrameStack(env, n_stack=4)

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

Advanced Usage

import wandb
import gym

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack, VecVideoRecorder
from stable_baselines3.common.callbacks import CheckpointCallback

from wandb.integration.sb3 import WandbCallback

from huggingface_sb3 import load_from_hub, push_to_hub

config = {
    "env_name": "PongNoFrameskip-v4",
    "num_envs": 8,
    "total_timesteps": int(10e6),
    "seed": 4089164106,    
}

run = wandb.init(
    project="HFxSB3",
    config = config,
    sync_tensorboard = True,  # Auto-upload sb3's tensorboard metrics
    monitor_gym = True, # Auto-upload the videos of agents playing the game
    save_code = True, # Save the code to W&B
    )

# There already exists an environment generator
# that will make and wrap atari environments correctly.
# Here we are also multi-worker training (n_envs=8 => 8 environments)
env = make_atari_env(config["env_name"], n_envs=config["num_envs"], seed=config["seed"]) #PongNoFrameskip-v4

print("ENV ACTION SPACE: ", env.action_space.n)

# Frame-stacking with 4 frames
env = VecFrameStack(env, n_stack=4)
# Video recorder
env = VecVideoRecorder(env, "videos", record_video_trigger=lambda x: x % 100000 == 0, video_length=2000)

# https://github.com/DLR-RM/rl-trained-agents/blob/10a9c31e806820d59b20d8b85ca67090338ea912/ppo/PongNoFrameskip-v4_1/PongNoFrameskip-v4/config.yml
model = PPO(policy = "CnnPolicy",
            env = env,
            batch_size = 256,
            clip_range = 0.1,
            ent_coef = 0.01,
            gae_lambda = 0.9,
            gamma = 0.99,
            learning_rate = 2.5e-4,
            max_grad_norm = 0.5,
            n_epochs = 4,
            n_steps = 128,
            vf_coef = 0.5,
            tensorboard_log = f"runs",
            verbose=1,
            )
    
model.learn(
    total_timesteps = config["total_timesteps"],
    callback = [
        WandbCallback(
        gradient_save_freq = 1000,
        model_save_path = f"models/{run.id}",
        ), 
        CheckpointCallback(save_freq=10000, save_path='./pong',
                                         name_prefix=config["env_name"]),
        ]
)

model.save("ppo-PongNoFrameskip-v4.zip")
push_to_hub(repo_id="ThomasSimonini/ppo-PongNoFrameskip-v4", 
    filename="ppo-PongNoFrameskip-v4.zip",
    commit_message="Added Pong trained agent")

📚 Documentation

Evaluation Results

Mean_reward: 21.00 +/- 0.0

Additional Information

The Action Space is 6 since we use only possible actions in this game.

📄 License

No license information provided in the original document.

📦 Model Information

Property	Details
Model Type	PPO Agent
Training Data	PongNoFrameskip-v4
Mean Reward	21.00

⚠️ Important Note

You need to use gym==0.19 since it includes Atari Roms.

💡 Usage Tip

The Action Space is 6 since we use only possible actions in this game.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご