AceReason-Nemotron-7B开源模型 - 免费解决数学与代码推理难题！

首页

Acereason Nemotron 7B

由 nvidia 开发

通过强化学习训练的数学与代码推理模型，基于DeepSeek-R1-Distilled-Qwen-7B，在数学和代码推理任务上表现优异

大型语言模型

Transformers

开源协议:其他 #数学推理强化 #代码生成优化 #RL训练突破

下载量 4,278

发布时间 : 5/22/2025

模型简介

AceReason-Nemotron-7B是一个完全通过强化学习(RL)训练的数学与代码推理模型，其基础模型为DeepSeek-R1-Distilled-Qwen-7B。该模型在数学和代码推理任务上取得了显著提升。

模型特点

强化学习训练

完全通过强化学习(RL)训练，显著提升数学与代码推理能力

数学推理能力

在AIME 2024上达到69.0%（提升14.5%），AIME 2025上53.6%（提升17.4%）

代码推理能力

在LiveCodeBench v5上51.8%（提升8%），LiveCodeBench v6上44.1%（提升7%）

训练方法创新

先对纯数学提示进行RL训练，再对纯代码提示进行RL训练，效果显著

模型能力

数学推理

代码生成

复杂问题解决

逐步推理

使用案例

数学竞赛

AIME数学竞赛题解答

解决AIME数学竞赛中的复杂问题

在AIME 2024上达到69.0%准确率

编程竞赛

LiveCodeBench编程题解答

解决LiveCodeBench中的编程问题

在LiveCodeBench v5上51.8%准确率

教育辅助

数学学习辅助

帮助学生理解复杂数学概念和解题方法

🚀 AceReason-Nemotron：通过强化学习提升数学与代码推理能力

AceReason-Nemotron是一款基于强化学习的数学与代码推理模型，它以DeepSeek-R1-Distilled-Qwen-7B为基础，在多个数学和代码推理基准测试中取得了显著的成绩。该模型通过系统的强化学习训练，不仅提升了数学推理能力，还在代码推理任务中表现出色。

我们很高兴地推出AceReason-Nemotron-7B，这是一个完全通过强化学习（RL）训练的数学和代码推理模型，其基础模型是DeepSeek-R1-Distilled-Qwen-7B。该模型取得了令人瞩目的成绩，在2024年美国数学邀请赛（AIME 2024）中达到69.0%（提升14.5%），在2025年美国数学邀请赛（AIME 2025）中达到53.6%（提升17.4%），在LiveCodeBench v5中达到51.8%（提升8%），在LiveCodeBench v6中达到44.1%（提升7%）。我们通过大量的消融实验系统地研究了强化学习训练过程，并提出了一种简单而有效的方法：先对仅含数学的提示进行强化学习训练，然后对仅含代码的提示进行强化学习训练。值得注意的是，我们发现仅针对数学的强化学习不仅显著提升了强大的蒸馏模型在数学基准测试中的性能，还提升了代码推理任务的性能。此外，扩展的仅针对代码的强化学习进一步提高了代码基准测试的性能，同时对数学结果的影响最小。我们发现强化学习不仅激发了模型在预训练和监督微调（如蒸馏）过程中获得的基础推理能力，还突破了模型推理能力的极限，使其能够解决以前无法解决的问题。

我们在技术报告中分享了训练方法和训练日志。

✨ 主要特性

强化学习训练：完全通过强化学习进行训练，从基础模型开始不断提升推理能力。
多领域表现出色：在数学和代码推理任务中都取得了显著的成绩。
系统研究方法：通过大量消融实验提出有效的训练方法。

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-7B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

高级用法

文档未提及高级用法代码示例，故跳过此部分。

📚 详细文档

结果展示

我们在AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）和LiveCodeBench v6（2025/02/01 - 2025/05/01）上，将我们的模型与Qwen2.5和Llama3.1模型家族中规模相当的竞争推理模型进行了评估。更多评估结果可在我们的技术报告中找到。

模型	AIME 2024 (avg@64)	AIME 2025 (avg@64)	LCB v5 (avg@8)	LCB v6 (avg@8)
QwQ-32B	79.5	65.8	63.4	-
DeepSeek-R1-671B	79.8	70.0	65.9	-
Llama-Nemotron-Ultra-253B	80.8	72.5	66.3	-
o3-mini (medium)	79.6	76.7	67.4	-
Light-R1-7B	59.1	44.3	40.6	36.4
Light-R1-14B	74	60.2	57.9	51.5
DeepCoder-14B (32K Inference)	71	56.1	57.9	50.4
OpenMath-Nemotron-7B	74.8	61.2	-	-
OpenCodeReasoning-Nemotron-7B	-	-	51.3	46.1
Llama-Nemotron-Nano-8B-v1	61.3	47.1	46.6	46.2
DeepSeek-R1-Distilled-Qwen-7B	55.5	39.0	37.6	34.1
DeepSeek-R1-Distilled-Qwen-14B	69.7	50.2	53.1	47.9
DeepSeek-R1-Distilled-Qwen-32B	72.6	54.9	57.2	-
AceReason-Nemotron-7B 🤖	69.0	53.6	51.8	44.1
AceReason-Nemotron-14B 🤖	78.6	67.4	61.1	54.9

使用建议

不要包含系统提示，而是将所有指令直接放在用户提示中。
对于数学问题，建议使用以下指令：请逐步推理，并将最终答案放在 \boxed{} 内。
对于代码问题，建议使用以下指令：

question = "" # code question
starter_code = "" # starter code function header

code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
if starter_code != "":
    question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
else:
    question += "\n\n" + code_instruction_nostartercode

final_prompt = "<ï½œUserï½œ>" + question + "<ï½œAssistantï½œ><think>\n"

我们用于评估的推理引擎是 vLLM==0.7.3，使用top-p=0.95，temperature=0.6，max_tokens=32768。
我们使用 AceMath scorer 进行数学评估，使用 LiveCodeBench官方脚本进行代码评估。

联系方式

Yang Chen (yachen@nvidia.com)
Zhuolin Yang (zhuoliny@nvidia.com)
Zihan Liu (zihanl@nvidia.com)
Chankyu Lee (chankyul@nvidia.com)
Wei Ping (wping@nvidia.com)

🔧 技术细节

文档未提及具体技术细节（>50字），故跳过此章节。

📄 许可证

您使用此模型受 NVIDIA开放模型许可证约束。

引用

@article{acereason2025,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2025}
}