Model Selection

Reinforcement learning training

# Reinforcement learning training

Mimo 7B RL 0530

MiMo is a series of 7B parameter models trained from scratch for inference tasks. Through optimized pre-training and post-training strategies, it performs excellently in mathematical and code reasoning tasks.

Large Language Model

Qwenlong L1 32B GGUF

QwenLong-L1-32B is a large language model designed for long context reasoning. It is trained through reinforcement learning and performs excellently in multiple long context question answering benchmark tests, capable of effectively handling complex reasoning tasks.

Large Language Model

Seed Coder 8B Reasoning Bf16

Seed-Coder is an 8B-scale open-source code model family, including base, instruction, and reasoning versions. The reasoning version enhances reasoning capabilities through reinforcement learning training and supports 64K context length.

Large Language Model

A 32-billion-parameter dense language model focused on enhancing reasoning capabilities, built upon Qwen 2.5-32B-Base, demonstrating performance comparable to larger MoE models in reasoning benchmarks.

Large Language Model

Deepseek R1 Bf16

DeepSeek-R1 is the first-generation inference model, which performs excellently in mathematics, code, and reasoning tasks, and its performance is comparable to that of OpenAI-o1.

Large Language Model

opensourcerelease

Dqn BeamRiderNoFrameskip V4

This is a reinforcement learning model based on the DQN algorithm, specifically designed for the Atari game environment BeamRiderNoFrameskip-v4.

Video Processing

Dqn BreakoutNoFrameskip V4

This is a deep reinforcement learning model based on the DQN algorithm, specifically designed for the Atari game environment BreakoutNoFrameskip-v4.

Video Processing

Dqn SpaceInvadersNoFrameskip V4

This is a Deep Q-Network (DQN) based reinforcement learning agent, specifically trained to play the Atari game 'Space Invaders'

Video Processing

This is a DQN reinforcement learning agent trained using the stable-baselines3 library, specifically designed to solve the Acrobot-v1 control problem.

Dqn PongNoFrameskip V4

This is a reinforcement learning model based on the DQN algorithm, specifically designed for the PongNoFrameskip-v4 environment.

Video Processing

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase