AceReason-Nemotron-7B-GGUF开源模型 - 免费部署助力数学与代码高效推理

首页

Acereason Nemotron 7B GGUF

由 QuantFactory 开发

AceReason-Nemotron-7B是一个基于强化学习训练的数学和代码推理模型，从DeepSeek-R1-Distilled-Qwen-7B开始训练，在多个基准测试中表现出色。

大型语言模型

Transformers

#数学推理强化 #代码生成优化 #多基准提升

下载量 326

发布时间 : 6/13/2025

模型简介

该模型专注于数学和代码推理任务，通过强化学习训练提升性能，适用于解决复杂的数学问题和编程挑战。

模型特点

强化学习训练

完全通过强化学习进行训练，显著提升数学和代码推理能力。

优异的性能表现

在AIME 2024、AIME 2025、LiveCodeBench v5和v6等基准测试中取得显著提升。

有效训练方法

先对数学提示进行强化学习训练，再对代码提示进行训练，优化性能表现。

模型能力

数学问题求解

代码生成

复杂推理

使用案例

教育

数学竞赛题解答

解决复杂的数学竞赛题目，如AIME竞赛题。

在AIME 2024中达到69.0%的准确率。

编程

代码生成与优化

生成和优化Python代码，解决编程问题。

在LiveCodeBench v5中达到51.8%的准确率。

🚀 QuantFactory/AceReason-Nemotron-7B-GGUF

这是使用llama.cpp创建的nvidia/AceReason-Nemotron-7B的量化版本。

🚀 快速开始

本项目提供了一个数学和代码推理模型AceReason-Nemotron-7B，它基于强化学习（RL）进行训练，从DeepSeek-R1-Distilled-Qwen-7B开始，在多个基准测试中取得了显著的成果。以下是使用该模型的基本步骤和相关信息。

✨ 主要特性

强化学习训练：AceReason-Nemotron-7B完全通过强化学习进行训练，从DeepSeek-R1-Distilled-Qwen-7B模型开始，展现出强大的推理能力。
优异的性能表现：在多个基准测试中取得了令人瞩目的成绩，如在AIME 2024中达到69.0%（提升14.5%），在AIME 2025中达到53.6%（提升17.4%），在LiveCodeBench v5中达到51.8%（提升8%），在LiveCodeBench v6中达到44.1%（提升7%）。
有效训练方法：提出了一种简单而有效的训练方法，即先对仅含数学的提示进行强化学习训练，然后对仅含代码的提示进行强化学习训练。研究发现，仅数学的强化学习不仅能显著提高强大蒸馏模型在数学基准测试中的性能，还能提升代码推理任务的表现；而扩展的仅代码强化学习在进一步提高代码基准测试性能的同时，对数学结果的影响最小。

📦 安装指南

文档未提及具体安装步骤，故跳过此章节。

💻 使用示例

基础用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-7B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

📚 详细文档

评估结果

我们在AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）和LiveCodeBench v6（2025/02/01 - 2025/05/01）上，将我们的模型与Qwen2.5和Llama3.1模型家族中规模相当的竞争推理模型进行了评估。更多评估结果可在我们的技术报告中找到。

模型	AIME 2024 (平均@64)	AIME 2025 (平均@64)	LCB v5 (平均@8)	LCB v6 (平均@8)
QwQ - 32B	79.5	65.8	63.4	-
DeepSeek - R1 - 671B	79.8	70.0	65.9	-
Llama - Nemotron - Ultra - 253B	80.8	72.5	66.3	-
o3 - mini (medium)	79.6	76.7	67.4	-
Light - R1 - 7B	59.1	44.3	40.6	36.4
Light - R1 - 14B	74	60.2	57.9	51.5
DeepCoder - 14B (32K推理)	71	56.1	57.9	50.4
OpenMath - Nemotron - 7B	74.8	61.2	-	-
OpenCodeReasoning - Nemotron - 7B	-	-	51.3	46.1
Llama - Nemotron - Nano - 8B - v1	61.3	47.1	46.6	46.2
DeepSeek - R1 - Distilled - Qwen - 7B	55.5	39.0	37.6	34.1
DeepSeek - R1 - Distilled - Qwen - 14B	69.7	50.2	53.1	47.9
DeepSeek - R1 - Distilled - Qwen - 32B	72.6	54.9	57.2	-
AceReason - Nemotron - 7B 🤖	69.0	53.6	51.8	44.1
AceReason - Nemotron - 14B 🤖	78.6	67.4	61.1	54.9

使用建议

不要包含系统提示，而是将所有指令直接放在用户提示中。
对于数学问题，建议使用以下指令：请逐步推理，并将最终答案放在\boxed{}内。
对于代码问题，建议使用以下指令：

question = "" # 代码问题
starter_code = "" # 起始代码函数头

code_instruction_nostartercode = """编写Python代码来解决问题。请将解决方案代码放在以下格式中：
```python
# 你的解决方案代码在这里
```"""
code_instruction_hasstartercode = """请将解决方案代码放在以下格式中：
```python
# 你的解决方案代码在这里
```"""
if starter_code != "":
    question += "\n\n" + "从提供的函数头开始解决问题。\n\n函数头：\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
else:
    question += "\n\n" + code_instruction_nostartercode

final_prompt = "<ï½œUserï½œ>" + question + "<ï½œAssistantï½œ><think>\n"

我们用于评估的推理引擎是vLLM==0.7.3，使用top - p = 0.95，temperature = 0.6，max_tokens = 32768。

评估工具包

请查看评估代码、脚本和缓存的预测文件。

联系方式

Yang Chen (yachen@nvidia.com)
Zhuolin Yang (zhuoliny@nvidia.com)
Zihan Liu (zihanl@nvidia.com)
Chankyu Lee (chankyul@nvidia.com)
Wei Ping (wping@nvidia.com)

🔧 技术细节

文档未提及具体技术实现细节，故跳过此章节。

📄 许可证

你使用此模型受NVIDIA开放模型许可证的约束。

引用

@article{chen2025acereason,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint arXiv:2505.16400},
  year={2025}
}