Nemotron-Research-Reasoning-Qwen-1.5B Open-source Model - Empowering Complex Reasoning Tasks in Mathematics, Coding, etc.

Nemotron Research Reasoning Qwen 1.5B

Developed by nvidia

An open-source weight model with 1.5 billion parameters, specifically designed for complex reasoning tasks, and performs excellently in fields such as mathematics, coding, science, and logical puzzles.

Large Language Model

Transformers

English#Complex reasoning optimization #STEM problem solving #Reinforcement learning enhancement

Downloads 1,236

Release Time : 5/28/2025

Model Overview

The Naimotlang Research Reasoning Model Qwen-1.5B is a leading open-source weight model with 1.5 billion parameters, specifically designed for complex reasoning tasks. It is trained on diverse datasets using the ProRL algorithm and performs excellently in fields such as mathematics, coding, science, and logical puzzles.

Model Features

ProRL algorithm

Extend the reinforcement learning training cycle to support more than 2000 training steps and conduct in-depth exploration of reasoning strategies.

Group Relative Policy Optimization (GRPO)

Introduce three key technologies: entropy collapse mitigation, decoupled clipping and Dynamic Adaptive Policy Optimization (DAPO), KL regularization, and reference policy reset.

Excellent reasoning ability

Perform excellently in tasks such as mathematics, coding, STEM reasoning, logical puzzles, and instruction following, significantly outperforming similar models.

Model Capabilities

Mathematical problem solving

Coding challenges

Scientific problem reasoning

Logical puzzle solving

STEM reasoning

Instruction following

Use Cases

Education

Mathematical competition problem solving

Used to solve mathematical competition problems such as AIME and AMC

Achieved 48.13% and 33.33% pass@1 in AIME24 and AIME25 respectively

Programming competition problem solving

Used to solve programming competition problems such as Codeforces

Achieved 34.50% pass@1 in the Codeforces benchmark test

Research

STEM problem research

Used to solve complex problems in the STEM field

Achieved 41.78% pass@1 in the GPQA benchmark test

Logical puzzle research

Used to solve complex logical puzzles

Achieved 59.06% pass@1 in the reasoning benchmark test

🚀 Nemotron-Research-Reasoning-Qwen-1.5B

The leading generalist reasoning model for cutting-edge research and development, excelling in complex reasoning tasks such as math, coding, and logic puzzles.

Comparison between DeepSeek-R1-1.5B and Nemotron-Research-Reasoning-Qwen-1.5B

🚀 Quick Start

This model is for research and development only.

✨ Features

Nemotron-Research-Reasoning-Qwen-1.5B is the world's leading 1.5B open-weight model for complex reasoning tasks such as mathematical problems, coding challenges, scientific questions, and logic puzzles. It is trained using the ProRL algorithm on a diverse and comprehensive set of datasets, achieving impressive results and outperforming Deepseek's 1.5B model by a large margin on a broad range of tasks, including math, coding, and GPQA.

🔧 Technical Details

ProRL: Prolonged Reinforcement Learning

ProRL is designed to enable extended RL training periods that facilitate deeper exploration of reasoning strategies. It enables more than 2k training steps and scales the training data across diverse tasks—from traditional math and code tasks to STEM problems, logical puzzles, and instruction following, which, we hypothesize, are crucial for generalization. Based on Group Relative Policy Optimization (GRPO), ProRL introduces three key techniques:

Mitigating Entropy Collapse
Decoupled clip and dynamic sampling policy optimization (DAPO)
KL regularization and reference policy reset

Using ProRL, we developed the world's best 1.5B reasoning model that significantly outperforms its base model, DeepSeek-R1-1.5B, and matches or even surpasses the performance of DeepSeek-R1-7B across a diverse range of benchmarks. Notably, compared to DeepSeek-R1-1.5B, we achieve average pass@1 improvements of 14.7% on math benchmarks, 13.9% on coding, 54.8% on logic puzzles, 25.1% on STEM reasoning, and 18.1% on instruction-following tasks.

📦 Installation

No installation steps provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples provided in the original document, so this section is skipped.

📚 Documentation

Training Datasets

Property	Details
Training Datasets	DeepScaleR-Preview-Dataset: Link Eurus-2-RL-Data: Link Reasoning-gym: Link IFEval: Link SCP-116K: Link

Evaluation Results

Table 1: Performance (pass@1) comparison for benchmarks across Math domain

Model	AIME24	AIME25	AMC	Math	Minerva	Olympiad	Avg
DeepSeek-R1-Distill-Qwen-1.5B	28.54	22.71	62.58	82.90	26.38	43.58	44.45
DeepScaleR-1.5B	40.21	31.46	73.04	89.36	41.57	51.63	54.54
DeepSeek-R1-Distill-Qwen-7B	53.54	40.83	82.83	93.68	50.60	57.66	63.19
Nemotron-Research-Reasoning-Qwen-1.5B	48.13	33.33	79.29	91.89	47.98	60.22	60.14

Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).

Model	apps	cc	cf	taco	human	LCB	Avg
DeepSeek-R1-Distill-Qwen-1.5B	20.95	16.79	14.13	8.03	61.77	16.80	23.08
DeepCoder-1.5B	30.37	23.76	21.70	13.76	73.40	22.76	30.96
DeepSeek-R1-Distill-Qwen-7B	42.08	32.76	33.08	19.08	83.32	38.04	41.39
Nemotron-Research-Reasoning-Qwen-1.5B	41.99	31.80	34.50	20.81	72.05	23.81	37.49

Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).

Model	GPQA	IFEval	Reasoning	acre	boxnet	game
DeepSeek-R1-Distill-Qwen-1.5B	15.86	44.05	4.24	5.99	0.00	3.49
DeepSeek-R1-Distill-Qwen-7B	35.44	58.01	28.55	20.21	1.71	12.94
Nemotron-Research-Reasoning-Qwen-1.5B	41.78	66.02	59.06	58.57	7.91	52.29

📄 License

The model is released under the cc-by-nc-4.0 license.

Ethical Considerations

⚠️ Important Note

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

💡 Usage Tip

Please report security vulnerabilities or NVIDIA AI Concerns here.

Citation

If you find our dataset helpful, please cite the following paper:

@article{liu2025prorl,
  author    = {Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong},
  title={ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models}, 
  journal   = {arXiv preprint},
  year      = {2025},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url={https://arxiv.org/abs/2505.24864}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご