AceReason-Nemotron-7B-GGUF開源模型 - 免費部署助力數學與代碼高效推理

首頁

Acereason Nemotron 7B GGUF

由QuantFactory開發

AceReason-Nemotron-7B是一個基於強化學習訓練的數學和代碼推理模型，從DeepSeek-R1-Distilled-Qwen-7B開始訓練，在多個基準測試中表現出色。

大型語言模型

Transformers

#數學推理強化 #代碼生成優化 #多基準提升

下載量 326

發布時間 : 6/13/2025

模型概述

該模型專注於數學和代碼推理任務，通過強化學習訓練提升性能，適用於解決複雜的數學問題和編程挑戰。

模型特點

強化學習訓練

完全通過強化學習進行訓練，顯著提升數學和代碼推理能力。

優異的性能表現

在AIME 2024、AIME 2025、LiveCodeBench v5和v6等基準測試中取得顯著提升。

有效訓練方法

先對數學提示進行強化學習訓練，再對代碼提示進行訓練，優化性能表現。

模型能力

數學問題求解

代碼生成

複雜推理

使用案例

教育

數學競賽題解答

解決複雜的數學競賽題目，如AIME競賽題。

在AIME 2024中達到69.0%的準確率。

編程

代碼生成與優化

生成和優化Python代碼，解決編程問題。

在LiveCodeBench v5中達到51.8%的準確率。

🚀 QuantFactory/AceReason-Nemotron-7B-GGUF

這是使用llama.cpp創建的nvidia/AceReason-Nemotron-7B的量化版本。

🚀 快速開始

本項目提供了一個數學和代碼推理模型AceReason-Nemotron-7B，它基於強化學習（RL）進行訓練，從DeepSeek-R1-Distilled-Qwen-7B開始，在多個基準測試中取得了顯著的成果。以下是使用該模型的基本步驟和相關信息。

✨ 主要特性

強化學習訓練：AceReason-Nemotron-7B完全通過強化學習進行訓練，從DeepSeek-R1-Distilled-Qwen-7B模型開始，展現出強大的推理能力。
優異的性能表現：在多個基準測試中取得了令人矚目的成績，如在AIME 2024中達到69.0%（提升14.5%），在AIME 2025中達到53.6%（提升17.4%），在LiveCodeBench v5中達到51.8%（提升8%），在LiveCodeBench v6中達到44.1%（提升7%）。
有效訓練方法：提出了一種簡單而有效的訓練方法，即先對僅含數學的提示進行強化學習訓練，然後對僅含代碼的提示進行強化學習訓練。研究發現，僅數學的強化學習不僅能顯著提高強大蒸餾模型在數學基準測試中的性能，還能提升代碼推理任務的表現；而擴展的僅代碼強化學習在進一步提高代碼基準測試性能的同時，對數學結果的影響最小。

📦 安裝指南

文檔未提及具體安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-7B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

📚 詳細文檔

評估結果

我們在AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）和LiveCodeBench v6（2025/02/01 - 2025/05/01）上，將我們的模型與Qwen2.5和Llama3.1模型家族中規模相當的競爭推理模型進行了評估。更多評估結果可在我們的技術報告中找到。

模型	AIME 2024 (平均@64)	AIME 2025 (平均@64)	LCB v5 (平均@8)	LCB v6 (平均@8)
QwQ - 32B	79.5	65.8	63.4	-
DeepSeek - R1 - 671B	79.8	70.0	65.9	-
Llama - Nemotron - Ultra - 253B	80.8	72.5	66.3	-
o3 - mini (medium)	79.6	76.7	67.4	-
Light - R1 - 7B	59.1	44.3	40.6	36.4
Light - R1 - 14B	74	60.2	57.9	51.5
DeepCoder - 14B (32K推理)	71	56.1	57.9	50.4
OpenMath - Nemotron - 7B	74.8	61.2	-	-
OpenCodeReasoning - Nemotron - 7B	-	-	51.3	46.1
Llama - Nemotron - Nano - 8B - v1	61.3	47.1	46.6	46.2
DeepSeek - R1 - Distilled - Qwen - 7B	55.5	39.0	37.6	34.1
DeepSeek - R1 - Distilled - Qwen - 14B	69.7	50.2	53.1	47.9
DeepSeek - R1 - Distilled - Qwen - 32B	72.6	54.9	57.2	-
AceReason - Nemotron - 7B 🤖	69.0	53.6	51.8	44.1
AceReason - Nemotron - 14B 🤖	78.6	67.4	61.1	54.9

使用建議

不要包含系統提示，而是將所有指令直接放在用戶提示中。
對於數學問題，建議使用以下指令：請逐步推理，並將最終答案放在\boxed{}內。
對於代碼問題，建議使用以下指令：

question = "" # 代碼問題
starter_code = "" # 起始代碼函數頭

code_instruction_nostartercode = """編寫Python代碼來解決問題。請將解決方案代碼放在以下格式中：
```python
# 你的解決方案代碼在這裡
```"""
code_instruction_hasstartercode = """請將解決方案代碼放在以下格式中：
```python
# 你的解決方案代碼在這裡
```"""
if starter_code != "":
    question += "\n\n" + "從提供的函數頭開始解決問題。\n\n函數頭：\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
else:
    question += "\n\n" + code_instruction_nostartercode

final_prompt = "<ï½œUserï½œ>" + question + "<ï½œAssistantï½œ><think>\n"

我們用於評估的推理引擎是vLLM==0.7.3，使用top - p = 0.95，temperature = 0.6，max_tokens = 32768。

評估工具包

請查看評估代碼、腳本和緩存的預測文件。

聯繫方式

Yang Chen (yachen@nvidia.com)
Zhuolin Yang (zhuoliny@nvidia.com)
Zihan Liu (zihanl@nvidia.com)
Chankyu Lee (chankyul@nvidia.com)
Wei Ping (wping@nvidia.com)

🔧 技術細節

文檔未提及具體技術實現細節，故跳過此章節。

📄 許可證

你使用此模型受NVIDIA開放模型許可證的約束。

引用

@article{chen2025acereason,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint arXiv:2505.16400},
  year={2025}
}