PPO - SeaquestNoFrameskip - v4 Open - Source Model - Freely Deploy and Enjoy Playing Atari Game SeaquestNoFrameskip

Ppo SeaquestNoFrameskip V4

Developed by ThomasSimonini

This is a PPO agent model trained using the stable-baselines3 library, specifically designed to play the Atari game SeaquestNoFrameskip-v4.

Video Processing #Atari Game AI #Deep Reinforcement Learning #Frame Stacking Training

Downloads 205

Release Time : 3/2/2022

Model Overview

The model is trained based on the PPO algorithm and can achieve high scores in the Seaquest game. It uses a CNN policy to process game frames and continuously optimizes game strategies through reinforcement learning.

Model Features

High-Performance Game AI

Achieves an average score of 1820 in the Seaquest game, demonstrating excellent performance

Stable Training Framework

Developed based on the stable-baselines3 library, ensuring stable and reliable training

Frame Stacking Processing

Uses 4-frame stacking technology to process game frames, enhancing the model's understanding of dynamic environments

Model Capabilities

Atari Game Control

Reinforcement Learning Decision Making

Game Frame Understanding

Use Cases

Game AI

Seaquest Auto Player

The model can automatically play Seaquest and achieve high scores

Average reward of 1820 points

Reinforcement Learning Research

PPO Algorithm Benchmark

Can serve as a performance benchmark for the PPO algorithm on Atari games

🚀 PPO Agent playing SeaquestNoFrameskip-v4

This project presents a trained PPO agent that plays the SeaquestNoFrameskip-v4 game using the stable-baselines3 library. It offers a practical solution for reinforcement learning in the Atari game environment.

Model Index

Name: PPO Agent
Results:
- Task:
  - Type: reinforcement-learning
- Dataset:
  - Type: SeaquestNoFrameskip-v4
  - Name: SeaquestNoFrameskip-v4
- Metrics:
  - Type: mean_reward
  - Value: 1820.00 +/- 20.0

Training Report

You can find the training report here.

🚀 Quick Start

Evaluation Results

The mean reward of the trained model is 1820.00 +/- 20.0.

📦 Installation

You need to use gym==0.19 since it includes Atari Roms.
The Action Space is 6 since we use only possible actions in this game.

💻 Usage Examples

Basic Usage

Watch your agent interact with the environment:

# Import the libraries
import os 

import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecNormalize

from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

from huggingface_sb3 import load_from_hub, push_to_hub

# Load the model
checkpoint = load_from_hub("ThomasSimonini/ppo-SeaquestNoFrameskip-v4", "ppo-SeaquestNoFrameskip-v4.zip")

# Because we using 3.7 on Colab and this agent was trained with 3.8 to avoid Pickle errors:
custom_objects = {
            "learning_rate": 0.0,
            "lr_schedule": lambda _: 0.0,
            "clip_range": lambda _: 0.0,
        }

model= PPO.load(checkpoint, custom_objects=custom_objects)

env = make_atari_env('SeaquestNoFrameskip-v4', n_envs=1)
env = VecFrameStack(env, n_stack=4)

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

Advanced Usage

Here is the training code:

import wandb
import gym

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack, VecVideoRecorder
from stable_baselines3.common.callbacks import CheckpointCallback

from wandb.integration.sb3 import WandbCallback

from huggingface_sb3 import load_from_hub, push_to_hub

config = {
    "env_name": "SeaquestNoFrameskip-v4",
    "num_envs": 8,
    "total_timesteps": int(10e6),
    "seed": 2862830927,    
}

run = wandb.init(
    project="HFxSB3",
    config = config,
    sync_tensorboard = True,  # Auto-upload sb3's tensorboard metrics
    monitor_gym = True, # Auto-upload the videos of agents playing the game
    save_code = True, # Save the code to W&B
    )

# There already exists an environment generator
# that will make and wrap atari environments correctly.
# Here we are also multi-worker training (n_envs=8 => 8 environments)
env = make_atari_env(config["env_name"], n_envs=config["num_envs"], seed=config["seed"]) #SeaquestNoFrameskip-v4

print("ENV ACTION SPACE: ", env.action_space.n)

# Frame-stacking with 4 frames
env = VecFrameStack(env, n_stack=4)
# Video recorder
env = VecVideoRecorder(env, "videos", record_video_trigger=lambda x: x % 100000 == 0, video_length=2000)

model = PPO(policy = "CnnPolicy",
            env = env,
            batch_size = 256,
            clip_range = 0.1,
            ent_coef = 0.01,
            gae_lambda = 0.9,
            gamma = 0.99,
            learning_rate = 2.5e-4,
            max_grad_norm = 0.5,
            n_epochs = 4,
            n_steps = 128,
            vf_coef = 0.5,
            tensorboard_log = f"runs",
            verbose=1,
            )
    
model.learn(
    total_timesteps = config["total_timesteps"],
    callback = [
        WandbCallback(
        gradient_save_freq = 1000,
        model_save_path = f"models/{run.id}",
        ), 
        CheckpointCallback(save_freq=10000, save_path='./seaquest',
                                         name_prefix=config["env_name"]),
        ]
)

model.save("ppo-SeaquestNoFrameskip-v4.zip")
push_to_hub(repo_id="ThomasSimonini/ppo-SeaquestNoFrameskip-v4", 
    filename="ppo-SeaquestNoFrameskip-v4.zip",
    commit_message="Added Seaquest trained agent")

📄 License

No license information provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご