# Reinforcement learning training
Mimo 7B RL 0530
MIT
MiMo is a series of 7B parameter models trained from scratch for inference tasks. Through optimized pre-training and post-training strategies, it performs excellently in mathematical and code reasoning tasks.
Large Language Model
Transformers

M
XiaomiMiMo
319
17
Qwenlong L1 32B GGUF
Apache-2.0
QwenLong-L1-32B is a large language model designed for long context reasoning. It is trained through reinforcement learning and performs excellently in multiple long context question answering benchmark tests, capable of effectively handling complex reasoning tasks.
Large Language Model
Transformers

Q
Mungert
927
7
Seed Coder 8B Reasoning Bf16
MIT
Seed-Coder is an 8B-scale open-source code model family, including base, instruction, and reasoning versions. The reasoning version enhances reasoning capabilities through reinforcement learning training and supports 64K context length.
Large Language Model
Transformers

S
ByteDance-Seed
4,382
9
AM Thinking V1
Apache-2.0
A 32-billion-parameter dense language model focused on enhancing reasoning capabilities, built upon Qwen 2.5-32B-Base, demonstrating performance comparable to larger MoE models in reasoning benchmarks.
Large Language Model
Transformers

A
a-m-team
1,377
153
Deepseek R1 Bf16
MIT
DeepSeek-R1 is the first-generation inference model, which performs excellently in mathematics, code, and reasoning tasks, and its performance is comparable to that of OpenAI-o1.
Large Language Model
Transformers

D
opensourcerelease
1,486
16
Dqn BeamRiderNoFrameskip V4
This is a reinforcement learning model based on the DQN algorithm, specifically designed for the Atari game environment BeamRiderNoFrameskip-v4.
Video Processing
D
sb3
169
0
Dqn BreakoutNoFrameskip V4
This is a deep reinforcement learning model based on the DQN algorithm, specifically designed for the Atari game environment BreakoutNoFrameskip-v4.
Video Processing
D
sb3
20
2
Dqn SpaceInvadersNoFrameskip V4
This is a Deep Q-Network (DQN) based reinforcement learning agent, specifically trained to play the Atari game 'Space Invaders'
Video Processing
D
sb3
58
4
Dqn Acrobot V1
This is a DQN reinforcement learning agent trained using the stable-baselines3 library, specifically designed to solve the Acrobot-v1 control problem.
Physics Model
D
sb3
403
0
Dqn PongNoFrameskip V4
This is a reinforcement learning model based on the DQN algorithm, specifically designed for the PongNoFrameskip-v4 environment.
Video Processing
D
sb3
16
1
Featured Recommended AI Models