# Reinforcement Learning Optimization
Mmada 8B MixCoT
MIT
MMaDA is a novel class of multimodal diffusion foundation models, excelling in various domains such as text reasoning, multimodal understanding, and text-to-image generation.
Text-to-Image
Transformers

M
Gen-Verse
601
3
Reasongen R1
Apache-2.0
ReasonGen-R1 is an autoregressive image generation model that integrates chain-of-thought reasoning. It enhances the logic and quality of image generation through SFT and RL.
Text-to-Image
Transformers

R
Franklin0
142
1
Thinkless 1.5B Warmup
Apache-2.0
The Thinkless framework is a learnable framework that enables large models to adaptively choose between short reasoning or long-chain reasoning based on task complexity and their own capabilities.
Large Language Model
Transformers

T
Vinnnf
966
1
Qwen2.5 VL 3B UI R1 E
MIT
UI-R1-E-3B is an efficient GUI positioning model fine-tuned based on Qwen2.5-VL-3B-Instruct, specializing in visual question-answering tasks, particularly excelling at locating and identifying operational elements in user interface screenshots.
Image-to-Text
Safetensors English
Q
LZXzju
75
3
Llama 3.1 Nemotron Nano 8B V1 GGUF
Other
Llama-3.1-Nemotron-Nano-8B-v1 is an inference model based on Meta Llama-3.1-8B-Instruct, enhanced through post-training to improve reasoning capabilities, human chat preferences, and task execution.
Large Language Model
Transformers English

L
unsloth
22.18k
3
INFRL Qwen2.5 VL 72B Preview Q8 With Bf16 Output And Bf16 Embedding.gguf
Apache-2.0
An improved multimodal vision-language model based on Qwen2.5-VL-72B-Instruct, excelling in multiple visual reasoning benchmarks
Text-to-Image English
I
GeorgyGUF
64
0
INFRL Qwen2.5 VL 72B Preview Bf16.gguf
Apache-2.0
A vision-language model optimized based on Qwen2.5-VL-72B-Instruct, excelling in multiple visual reasoning benchmarks
Text-to-Image English
I
GeorgyGUF
40
0
Llama 3.1 8B Instruct
Meta Llama 3.1 series multilingual large language model, featuring 8B parameters, optimized for multilingual conversational use cases, supporting 8 languages.
Large Language Model
Safetensors Supports Multiple Languages
L
RedHatAI
292
1
RM R1 DeepSeek Distilled Qwen 14B
MIT
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, providing explainable judgments.
Large Language Model
Transformers English

R
gaotang
95
1
II Medical 7B Preview
A medical reasoning model fine-tuned based on Qwen/Qwen2.5-7B-Instruct, excelling in multiple medical QA benchmarks
Large Language Model
Transformers

I
Intelligent-Internet
112
9
Skywork VL Reward 7B
MIT
Skywork-VL-Reward-7B is a 7B-parameter multimodal reward model based on the Qwen2.5-VL-7B-Instruct architecture, enhanced with a value head structure for training reward models.
Multimodal Fusion
Transformers

S
Skywork
30
8
Deepcoder 1.5B Preview GGUF
MIT
A code-reasoning large language model fine-tuned based on DeepSeek-R1-Distilled-Qwen-1.5B, utilizing distributed reinforcement learning technology to extend long-context processing capabilities
Large Language Model English
D
Mungert
888
2
Tinyllava Video R1
Apache-2.0
TinyLLaVA-Video-R1 is a small-scale video reasoning model based on the traceable training model TinyLLaVA-Video. It significantly enhances reasoning and thinking abilities through reinforcement learning and exhibits the emergent property of 'epiphany moments'.
Video-to-Text
Transformers

T
Zhang199
123
2
Deepcoder 14B Preview Exl2
DeepCoder-14B-Preview is a code generation model developed based on DeepSeek-R1-Distill-Qwen-14B, focusing on solving verifiable programming problems.
Large Language Model English
D
cgus
46
2
Deepcoder 1.5B Preview Exl2 4.65bpw
MIT
A code reasoning large model fine-tuned based on DeepSeek-R1-Distilled-Qwen-1.5B, utilizing distributed reinforcement learning technology to enhance long-context processing capabilities
Large Language Model
Transformers English

D
async0x42
14
3
Quasar 3.0 Instract V2
Quasar-3.0-7B is the distilled version of the upcoming 400B Quasar 3.0 model, showcasing the early strength and potential of the Quasar architecture.
Large Language Model
Transformers

Q
silx-ai
314
8
Quasar 3.0 Final
Quasar-3.0-Max is a 7B parameter distilled model provided by SILX INC, showcasing the early potential of the Quasar architecture with innovative TTM training process and reinforcement learning techniques.
Large Language Model
Transformers

Q
silx-ai
118
4
VARGPT V1.1
Apache-2.0
VARGPT-v1.1 is a visual autoregressive unified large model, enhanced through iterative instruction tuning and reinforcement learning, capable of performing both visual understanding and generation tasks.
Text-to-Image
Transformers English

V
VARGPT-family
954
6
VARGPT V1.1 Edit
Apache-2.0
VARGPT-v1.1 is a vision autoregressive unified large model enhanced through iterative instruction tuning and reinforcement learning, supporting vision understanding and generation tasks.
Text-to-Image
Transformers English

V
VARGPT-family
169
1
Qwen2.5 VL 3B UI R1
MIT
UI-R1 is a vision-language model enhanced by reinforcement learning for GUI agent action prediction, built upon Qwen2.5-VL-3B-Instruct.
Text-to-Image English
Q
LZXzju
96
6
R1 Aqa
Apache-2.0
R1-AQA is an audio question answering model based on Qwen2-Audio-7B-Instruct, optimized through Group Relative Policy Optimization (GRPO) algorithm, achieving state-of-the-art performance in the MMAU benchmark.
Audio-to-Text
Transformers

R
mispeech
791
14
Light R1 14B DS
Apache-2.0
Light-R1-14B-DS is a 14B-parameter math SOTA model trained with reinforcement learning, excelling in AIME24/25 and GPQA benchmarks.
Large Language Model
Transformers

L
qihoo360
2,890
33
Visualthinker R1 Zero
MIT
The first multimodal reasoning model to reproduce 'Aha moments' and increased response length on just a 2B model with unsupervised fine-tuning
Image-to-Text
Safetensors English
V
turningpoint-ai
578
6
DPO A5 Nlp
TRL is a reinforcement learning library based on the Transformer architecture for training and fine-tuning language models.
Large Language Model
Transformers

D
EraCoding
26
1
Qwen2.5vl 3B VLM R1 REC 500steps
A vision-language model based on Qwen2.5-VL-3B-Instruct, enhanced with VLM-R1 reinforcement learning, focusing on referring expression comprehension tasks.
Text-to-Image
Safetensors English
Q
omlab
976
22
Text2graph R1 Qwen2.5 0.5b
Apache-2.0
A text-to-graph information extraction model based on Qwen-2.5-0.5B, jointly trained through reinforcement learning (GRPO) and supervised learning.
Knowledge Graph English
T
Ihor
199
20
Cycleresearcher 12B Original
Other
CycleResearcher is an automated research system based on reinforcement learning and iterative feedback, specifically trained for machine learning research, covering fields such as computer vision and natural language processing.
Large Language Model
Transformers Supports Multiple Languages

C
WestlakeNLP
250
1
T5 Query Reformulation RL
Apache-2.0
This is a generative model specifically designed for search query rewriting, employing a sequence-to-sequence architecture and reinforcement learning framework to produce diverse and relevant query rewrites.
Large Language Model
Transformers Supports Multiple Languages

T
prhegde
366
6
Speechless Llama2 Luban Orca Platypus 13b
This model is a merge of AIDC-ai-business/Luban-13B and Open-Orca/OpenOrca-Platypus2-13B, forming a 13-billion-parameter large language model based on the Llama 2 architecture.
Large Language Model
Transformers English

S
uukuguy
94
4
Ppo LunarLanderContinuous V2
This is a reinforcement learning agent based on the PPO algorithm, specifically trained for the LunarLanderContinuous-v2 environment, capable of controlling the lunar lander for smooth landing.
Physics Model
P
sb3
15
0
Bart Rl
Korean dialogue summarization model based on BART architecture, trained by the 'Alaggung Dalaggung' team in the 2021 Hunminjeongeum Korean Speech & Natural Language AI Competition
Text Generation
Transformers Korean

B
alaggung
18
0
Featured Recommended AI Models