Openreasoning Nemotron 7B

由nvidia開發

OpenReasoning-Nemotron-7B 是一個基於 Qwen2.5-7B-Instruct 衍生的大語言模型，專注於數學、代碼和科學解決方案的生成。

大型語言模型

Transformers

支持多種語言#64K長文本推理 #競賽數學求解 #代碼生成優化

下載量 222

發布時間 : 7/15/2025

模型概述

該模型是一個推理模型，經過後訓練以生成數學、代碼和科學問題的解決方案，支持高達64K的輸出令牌數。

模型特點

高性能推理能力

在數學、代碼和科學推理基準測試中表現出色，7B、14B和32B模型在其規模類別中創造了新的最先進記錄。

多智能體協作

支持通過生成式解決方案選擇（GenSelect）進行多智能體協作，顯著提升數學和編碼基準測試的性能。

長上下文支持

支持高達64K的輸出令牌數，適合處理複雜的推理任務。

模型能力

數學問題求解

代碼生成

科學問題解答

長文本生成

使用案例

教育

數學競賽問題解答

用於解答AIME、HMMT等數學競賽中的複雜問題。

在AIME24和AIME25基準測試中表現優異。

軟件開發

代碼生成

生成Python代碼解決編程問題。

在LiveCodeBench和SciCode基準測試中表現優異。

科學研究

科學問題解答

解答GPQA、MMLU-PRO等科學問題。

在HLE基準測試中表現優異。

🚀 OpenReasoning-Nemotron-7B 概述

OpenReasoning-Nemotron-7B 是一個大語言模型（LLM），它基於 Qwen2.5-7B-Instruct（即參考模型）衍生而來。這是一個推理模型，經過了針對數學、代碼和科學解決方案生成的後訓練。我們對該模型進行了評估，其輸出令牌數最高可達 64K。OpenReasoning 模型有以下幾種規模：1.5B、7B、14B 和 32B。

此模型可用於商業或非商業研究。

許可證/使用條款

適用條款：上述模型的使用受知識共享署名 4.0 國際許可協議（CC-BY-4.0）約束。附加信息：Apache 2.0 許可證

✨ 主要特性

推理基準測試成績

評估結果（pass@1）

我們的模型在一系列具有挑戰性的推理基準測試中表現出色。7B、14B 和 32B 模型在其規模類別中始終創造新的最先進記錄。

模型	人工分析指數*	GPQA	MMLU-PRO	HLE	即時代碼基準*	SciCode	AIME24	AIME25	HMMT FEB 25
1.5B	31.0	31.6	47.5	5.5	28.6	2.2	55.5	45.6	31.5
7B	54.7	61.1	71.9	8.3	63.3	16.2	84.7	78.2	63.5
14B	60.9	71.6	77.5	10.1	67.8	23.5	87.8	82.0	71.2
32B	64.3	73.1	80.0	11.9	70.2	28.5	89.2	84.0	73.8

* 這是我們對人工分析智能指數的估計，並非官方分數。

* LiveCodeBench 版本 6，日期範圍 2408 - 2505。

多智能體協作

OpenReasoning-Nemotron 模型可以通過啟動多個並行生成並通過生成式解決方案選擇（GenSelect）將它們組合在一起，以“重”模式使用。為了添加此“技能”，我們遵循原始的 GenSelect 訓練流程，但不基於選擇摘要進行訓練，而是使用 DeepSeek R1 0528 671B 的完整推理軌跡。我們僅訓練模型為數學問題選擇最佳解決方案，但令人驚訝的是，發現此能力可直接推廣到代碼和科學問題！在這種“重”GenSelect 推理模式下，OpenReasoning-Nemotron-32B 模型在數學和編碼基準測試中超越了 O3（高）。

GenSelect 評估結果

模型	Pass@1 (Avg@64)	Majority@64	GenSelect
1.5B
AIME24	55.5	76.7	76.7
AIME25	45.6	70.0	70.0
HMMT Feb 25	31.5	46.7	53.3
7B
AIME24	84.7	93.3	93.3
AIME25	78.2	86.7	93.3
HMMT Feb 25	63.5	83.3	90.0
LCB v6 2408 - 2505	63.4	n/a	67.7
14B
AIME24	87.8	93.3	93.3
AIME25	82.0	90.0	90.0
HMMT Feb 25	71.2	86.7	93.3
LCB v6 2408 - 2505	67.9	n/a	69.1
32B
AIME24	89.2	93.3	93.3
AIME25	84.0	90.0	93.3
HMMT Feb 25	73.8	86.7	96.7
LCB v6 2408 - 2505	70.2	n/a	75.3
HLE	11.8	13.4	15.5

📦 安裝指南

文檔未提及安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

import transformers
import torch
model_id = "nvidia/OpenReasoning-Nemotron-7B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)
# Code generation prompt
prompt = """You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.
Please use python programming language only.
You must use ```python for just the final solution code block with the following format:
```python
# Your code here

{user} """

Math generation prompt

prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \boxed{}.

{user}

"""

Science generation prompt

You can refer to prompts here -

https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/prompt/config/generic/hle.yaml (HLE)

https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/prompt/config/eval/aai/mcq-4choices-boxed.yaml (for GPQA)

https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/prompt/config/eval/aai/mcq-10choices-boxed.yaml (MMLU-Pro)

messages = [ { "role": "user", "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")}, ] outputs = pipeline( messages, max_new_tokens=64000, ) print(outputs[0]["generated_text"][-1]['content'])


## 📚 詳細文檔

### 引用
如果您發現這些數據有用，請引用：

@article{ahmad2025opencodereasoning, title={OpenCodeReasoning: Advancing Data Distillation for Competitive Coding}, author={Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg}, year={2025}, eprint={2504.01943}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.01943}, }

@misc{ahmad2025opencodereasoningiisimpletesttime, title={OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique}, author={Wasi Uddin Ahmad and Somshubra Majumdar and Aleksander Ficek and Sean Narenthiran and Mehrzad Samadi and Jocelyn Huang and Siddhartha Jain and Vahid Noroozi and Boris Ginsburg}, year={2025}, eprint={2507.09075}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.09075}, }

@misc{moshkov2025aimo2winningsolutionbuilding, title={AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset}, author={Ivan Moshkov and Darragh Hanley and Ivan Sorokin and Shubham Toshniwal and Christof Henkel and Benedikt Schifferer and Wei Du and Igor Gitman}, year={2025}, eprint={2504.16891}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2504.16891}, }


### 附加信息
| 屬性 | 詳情 |
|------|------|
| 部署地區 | 全球 |
| 使用場景 | 此模型適用於處理競賽數學、代碼和科學問題的開發者和研究人員。它僅通過監督微調進行訓練，以在基準測試中取得高分。 |
| 發佈日期 | Huggingface [2025 年 7 月 16 日]，通過 [鏈接](https://huggingface.co/nvidia/OpenReasoning-Nemotron-7B/) |
| 參考資料 | [2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding <br> [2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding <br> [2504.16891] AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset |
| 模型架構類型 | 密集解碼器 Transformer 模型 |
| 網絡架構 | Qwen-7B-Instruct |
| 輸入類型 | 文本 |
| 輸入格式 | 字符串 |
| 輸入參數 | 一維（1D） |
| 輸入其他屬性 | 訓練輸出令牌數最高可達 64,000 |
| 輸出類型 | 文本 |
| 輸出格式 | 字符串 |
| 輸出參數 | 一維（1D） |
| 輸出其他屬性 | 訓練輸出令牌數最高可達 64,000 |
| 軟件集成 - 運行時引擎 | NeMo 2.3.0 |
| 軟件集成 - 推薦硬件微架構兼容性 | NVIDIA Ampere <br> NVIDIA Hopper |
| 軟件集成 - 首選/支持的操作系統 | Linux |
| 模型版本 | 1.0（2025 年 7 月 16 日） <br> OpenReasoning-Nemotron-32B <br> OpenReasoning-Nemotron-14B <br> OpenReasoning-Nemotron-7B <br> OpenReasoning-Nemotron-1.5B |

### 訓練和評估數據集
#### 訓練數據集
OpenReasoning-Nemotron-7B 的訓練語料庫由來自 [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) 數據集、[OpenCodeReasoning-II](https://arxiv.org/abs/2507.09075)、[OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) 的問題以及 [Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) 中的合成科學問題組成。所有響應均使用 DeepSeek-R1-0528 生成。我們還未做修改地包含了 Llama-Nemotron-Post-Training-Dataset 中的指令跟隨和工具調用數據。

數據收集方法：混合：自動化、人工、合成
標註方法：混合：自動化、人工、合成
屬性：來自 OpenCodeReasoning 問題（https://huggingface.co/datasets/nvidia/OpenCodeReasoning）、[OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) 和 [Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) 中合成科學問題的 500 萬個 DeepSeek-R1-0528 生成的響應。我們還未做修改地包含了 Llama-Nemotron-Post-Training-Dataset 中的指令跟隨和工具調用數據。

#### 評估數據集
我們使用以下基準對模型進行全面評估。

##### 數學
- AIME 2024/2025
- HMMT
- BRUNO 2025

##### 代碼
- LiveCodeBench
- SciCode

##### 科學
- GPQA
- MMLU-PRO
- HLE

數據收集方法：混合：自動化、人工、合成
標註方法：混合：自動化、人工、合成

### 推理
| 屬性 | 詳情 |
|------|------|
| 加速引擎 | vLLM, Tensor(RT)-LLM |
| 測試硬件 | NVIDIA H100 - 80GB |

### 倫理考量
NVIDIA 認為可信 AI 是一項共同責任，我們已經制定了政策和實踐，以支持廣泛的 AI 應用開發。當按照我們的服務條款下載或使用時，開發者應與內部模型團隊合作，確保該模型滿足相關行業和使用場景的要求，並解決不可預見的產品濫用問題。

有關此模型倫理考量的更多詳細信息，請參閱模型卡片++可解釋性、偏差、安全與保障以及隱私子卡片。

請 [在此](https://www.nvidia.com/en-us/support/submit-security-vulnerability/) 報告模型質量、風險、安全漏洞或 NVIDIA AI 相關問題。

## 📄 許可證
此模型的使用受 [知識共享署名 4.0 國際許可協議（CC-BY-4.0）](https://creativecommons.org/licenses/by/4.0/legalcode.en) 約束。附加信息：[Apache 2.0 許可證](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE)