Open-source model of ppo-BreakoutNoFrameskip-v4 - Facilitating efficient learning in the Atari game Breakout

Ppo BreakoutNoFrameskip V4

Developed by ThomasSimonini

A deep reinforcement learning model trained using the PPO algorithm in the Atari Breakout environment

Video Processing #Atari Game AI #Deep Reinforcement Learning #Frame Stacking Training

Downloads 459

Release Time : 3/2/2022

Model Overview

This model is implemented based on the stable-baselines3 library, trained using the PPO algorithm in the BreakoutNoFrameskip-v4 environment, capable of playing the classic Atari Breakout game.

Model Features

Based on PPO Algorithm

Uses the Proximal Policy Optimization (PPO) algorithm, a widely-used policy gradient method in reinforcement learning

Frame Stacking Processing

Employs 4-frame stacking technology to process game screens, enabling the model to perceive temporal dynamics

Parallel Environment Training

Uses 8 parallel environments for training to improve sample collection efficiency

Stable Training

Adopts various stabilization techniques such as gradient clipping and value function coefficients to ensure training stability

Model Capabilities

Atari Game Control

Reinforcement Learning Decision Making

Real-time Game Interaction

Use Cases

Game AI

Breakout Game AI

Acts as an automatic player for the Breakout game, capable of consistently achieving high scores

Average reward reaches 339 points

Reinforcement Learning Research

Algorithm Benchmarking

Can serve as a performance benchmark for the PPO algorithm on Atari games

🚀 PPO Agent playing BreakoutNoFrameskip-v4

This is a trained model of a PPO agent playing BreakoutNoFrameskip-v4 using the stable-baselines3 library, which can achieve good results in the Breakout game.

Model Index

Property	Details
Model Name	PPO Agent
Task Type	reinforcement-learning
Dataset Type	BreakoutNoFrameskip-v4
Dataset Name	BreakoutNoFrameskip-v4
Metric Type	mean_reward
Metric Value	339

The training report can be found here: https://wandb.ai/simoninithomas/HFxSB3/reports/Atari-HFxSB3-Benchmark--VmlldzoxNjI3NTIy

🚀 Quick Start

Evaluation Results

Mean_reward: 339.0

Usage (with Stable-baselines3)

⚠️ Important Note

You need to use gym==0.19 since it includes Atari Roms. The Action Space is 6 since we use only possible actions in this game.

Watch your agent interacts:

# Import the libraries
import os 

import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecNormalize

from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

from huggingface_sb3 import load_from_hub, push_to_hub

# Load the model
checkpoint = load_from_hub("ThomasSimonini/ppo-BreakoutNoFrameskip-v4", "ppo-BreakoutNoFrameskip-v4.zip")

# Because we using 3.7 on Colab and this agent was trained with 3.8 to avoid Pickle errors:
custom_objects = {
            "learning_rate": 0.0,
            "lr_schedule": lambda _: 0.0,
            "clip_range": lambda _: 0.0,
        }

model= PPO.load(checkpoint, custom_objects=custom_objects)

env = make_atari_env('BreakoutNoFrameskip-v4', n_envs=1)
env = VecFrameStack(env, n_stack=4)

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

💻 Usage Examples

Basic Usage

# The above code for watching the agent interact is the basic usage example.

🔧 Technical Details

Training Code

import wandb
import gym

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack, VecVideoRecorder
from stable_baselines3.common.callbacks import CheckpointCallback

from wandb.integration.sb3 import WandbCallback

from huggingface_sb3 import load_from_hub, push_to_hub

config = {
    "env_name": "BreakoutNoFrameskip-v4",
    "num_envs": 8,
    "total_timesteps": int(10e6),
    "seed": 661550378,    
}

run = wandb.init(
    project="HFxSB3",
    config = config,
    sync_tensorboard = True,  # Auto-upload sb3's tensorboard metrics
    monitor_gym = True, # Auto-upload the videos of agents playing the game
    save_code = True, # Save the code to W&B
    )

# There already exists an environment generator
# that will make and wrap atari environments correctly.
# Here we are also multi-worker training (n_envs=8 => 8 environments)
env = make_atari_env(config["env_name"], n_envs=config["num_envs"], seed=config["seed"]) #BreakoutNoFrameskip-v4

print("ENV ACTION SPACE: ", env.action_space.n)

# Frame-stacking with 4 frames
env = VecFrameStack(env, n_stack=4)
# Video recorder
env = VecVideoRecorder(env, "videos", record_video_trigger=lambda x: x % 100000 == 0, video_length=2000)

model = PPO(policy = "CnnPolicy",
            env = env,
            batch_size = 256,
            clip_range = 0.1,
            ent_coef = 0.01,
            gae_lambda = 0.9,
            gamma = 0.99,
            learning_rate = 2.5e-4,
            max_grad_norm = 0.5,
            n_epochs = 4,
            n_steps = 128,
            vf_coef = 0.5,
            tensorboard_log = f"runs",
            verbose=1,
            )
    
model.learn(
    total_timesteps = config["total_timesteps"],
    callback = [
        WandbCallback(
        gradient_save_freq = 1000,
        model_save_path = f"models/{run.id}",
        ), 
        CheckpointCallback(save_freq=10000, save_path='./breakout',
                                         name_prefix=config["env_name"]),
        ]
)

model.save("ppo-BreakoutNoFrameskip-v4.zip")
push_to_hub(repo_id="ThomasSimonini/ppo-BreakoutNoFrameskip-v4", 
    filename="ppo-BreakoutNoFrameskip-v4.zip",
    commit_message="Added Breakout trained agent")