AceReason-Nemotron-7B開源模型 - 免費解決數學與代碼推理難題！

首頁

Acereason Nemotron 7B

由nvidia開發

通過強化學習訓練的數學與代碼推理模型，基於DeepSeek-R1-Distilled-Qwen-7B，在數學和代碼推理任務上表現優異

大型語言模型

Transformers

開源協議:其他 #數學推理強化 #代碼生成優化 #RL訓練突破

下載量 4,278

發布時間 : 5/22/2025

模型概述

AceReason-Nemotron-7B是一個完全通過強化學習(RL)訓練的數學與代碼推理模型，其基礎模型為DeepSeek-R1-Distilled-Qwen-7B。該模型在數學和代碼推理任務上取得了顯著提升。

模型特點

強化學習訓練

完全通過強化學習(RL)訓練，顯著提升數學與代碼推理能力

數學推理能力

在AIME 2024上達到69.0%（提升14.5%），AIME 2025上53.6%（提升17.4%）

代碼推理能力

在LiveCodeBench v5上51.8%（提升8%），LiveCodeBench v6上44.1%（提升7%）

訓練方法創新

先對純數學提示進行RL訓練，再對純代碼提示進行RL訓練，效果顯著

模型能力

數學推理

代碼生成

複雜問題解決

逐步推理

使用案例

數學競賽

AIME數學競賽題解答

解決AIME數學競賽中的複雜問題

在AIME 2024上達到69.0%準確率

編程競賽

LiveCodeBench編程題解答

解決LiveCodeBench中的編程問題

在LiveCodeBench v5上51.8%準確率

教育輔助

數學學習輔助

幫助學生理解複雜數學概念和解題方法

🚀 AceReason-Nemotron：通過強化學習提升數學與代碼推理能力

AceReason-Nemotron是一款基於強化學習的數學與代碼推理模型，它以DeepSeek-R1-Distilled-Qwen-7B為基礎，在多個數學和代碼推理基準測試中取得了顯著的成績。該模型通過系統的強化學習訓練，不僅提升了數學推理能力，還在代碼推理任務中表現出色。

我們很高興地推出AceReason-Nemotron-7B，這是一個完全通過強化學習（RL）訓練的數學和代碼推理模型，其基礎模型是DeepSeek-R1-Distilled-Qwen-7B。該模型取得了令人矚目的成績，在2024年美國數學邀請賽（AIME 2024）中達到69.0%（提升14.5%），在2025年美國數學邀請賽（AIME 2025）中達到53.6%（提升17.4%），在LiveCodeBench v5中達到51.8%（提升8%），在LiveCodeBench v6中達到44.1%（提升7%）。我們通過大量的消融實驗系統地研究了強化學習訓練過程，並提出了一種簡單而有效的方法：先對僅含數學的提示進行強化學習訓練，然後對僅含代碼的提示進行強化學習訓練。值得注意的是，我們發現僅針對數學的強化學習不僅顯著提升了強大的蒸餾模型在數學基準測試中的性能，還提升了代碼推理任務的性能。此外，擴展的僅針對代碼的強化學習進一步提高了代碼基準測試的性能，同時對數學結果的影響最小。我們發現強化學習不僅激發了模型在預訓練和監督微調（如蒸餾）過程中獲得的基礎推理能力，還突破了模型推理能力的極限，使其能夠解決以前無法解決的問題。

我們在技術報告中分享了訓練方法和訓練日誌。

✨ 主要特性

強化學習訓練：完全通過強化學習進行訓練，從基礎模型開始不斷提升推理能力。
多領域表現出色：在數學和代碼推理任務中都取得了顯著的成績。
系統研究方法：通過大量消融實驗提出有效的訓練方法。

📦 安裝指南

文檔未提及安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-7B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

高級用法

文檔未提及高級用法代碼示例，故跳過此部分。

📚 詳細文檔

結果展示

我們在AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）和LiveCodeBench v6（2025/02/01 - 2025/05/01）上，將我們的模型與Qwen2.5和Llama3.1模型家族中規模相當的競爭推理模型進行了評估。更多評估結果可在我們的技術報告中找到。

模型	AIME 2024 (avg@64)	AIME 2025 (avg@64)	LCB v5 (avg@8)	LCB v6 (avg@8)
QwQ-32B	79.5	65.8	63.4	-
DeepSeek-R1-671B	79.8	70.0	65.9	-
Llama-Nemotron-Ultra-253B	80.8	72.5	66.3	-
o3-mini (medium)	79.6	76.7	67.4	-
Light-R1-7B	59.1	44.3	40.6	36.4
Light-R1-14B	74	60.2	57.9	51.5
DeepCoder-14B (32K Inference)	71	56.1	57.9	50.4
OpenMath-Nemotron-7B	74.8	61.2	-	-
OpenCodeReasoning-Nemotron-7B	-	-	51.3	46.1
Llama-Nemotron-Nano-8B-v1	61.3	47.1	46.6	46.2
DeepSeek-R1-Distilled-Qwen-7B	55.5	39.0	37.6	34.1
DeepSeek-R1-Distilled-Qwen-14B	69.7	50.2	53.1	47.9
DeepSeek-R1-Distilled-Qwen-32B	72.6	54.9	57.2	-
AceReason-Nemotron-7B 🤖	69.0	53.6	51.8	44.1
AceReason-Nemotron-14B 🤖	78.6	67.4	61.1	54.9

使用建議

不要包含系統提示，而是將所有指令直接放在用戶提示中。
對於數學問題，建議使用以下指令：請逐步推理，並將最終答案放在 \boxed{} 內。
對於代碼問題，建議使用以下指令：

question = "" # code question
starter_code = "" # starter code function header

code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
if starter_code != "":
    question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
else:
    question += "\n\n" + code_instruction_nostartercode

final_prompt = "<ï½œUserï½œ>" + question + "<ï½œAssistantï½œ><think>\n"

我們用於評估的推理引擎是 vLLM==0.7.3，使用top-p=0.95，temperature=0.6，max_tokens=32768。
我們使用 AceMath scorer 進行數學評估，使用 LiveCodeBench官方腳本進行代碼評估。

聯繫方式

Yang Chen (yachen@nvidia.com)
Zhuolin Yang (zhuoliny@nvidia.com)
Zihan Liu (zihanl@nvidia.com)
Chankyu Lee (chankyul@nvidia.com)
Wei Ping (wping@nvidia.com)

🔧 技術細節

文檔未提及具體技術細節（>50字），故跳過此章節。

📄 許可證

您使用此模型受 NVIDIA開放模型許可證約束。

引用

@article{acereason2025,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2025}
}