AceReason-Nemotron-14B-GGUF開源模型 - 強化學習助力數學與編程推理

首頁

Acereason Nemotron 14B GGUF

由unsloth開發

基於強化學習訓練的數學與編程推理模型，在多項基準測試中表現優異

大型語言模型

Transformers

英語開源協議:其他 #強化學習推理 #數學編程雙優 #RL突破性能

下載量 1,417

發布時間 : 5/23/2025

模型概述

AceReason-Nemotron-14B是一個完全通過強化學習訓練的數學與編程推理模型，基於DeepSeek-R1-Distilled-Qwen-14B開發，在數學和編程推理任務上取得了顯著提升。

模型特點

強化學習訓練

完全通過強化學習訓練，顯著提升數學和編程推理能力

兩階段訓練方法

先在純數學提示上進行RL訓練，然後在純編程提示上進行RL訓練

跨領域提升

純數學RL不僅提升數學能力，還提升了編程推理表現

Unsloth優化

採用Unsloth Dynamic 2.0實現卓越的準確性，超越其他量化方法

模型能力

數學推理

編程推理

複雜問題解決

代碼生成

使用案例

數學競賽

AIME競賽題解答

解決美國數學邀請賽(AIME)題目

AIME 2024達到78.6%，提升8.9%

編程競賽

LiveCodeBench測試

解決編程競賽題目

LiveCodeBench v5達到61.1%，提升8%

Codeforces競賽

解決Codeforces編程題目

Codeforces得分提升543分

教育

數學學習輔助

幫助學生理解和解決複雜數學問題

編程學習輔助

輔助學習算法和編程技巧

🚀 AceReason-Nemotron：通過強化學習提升數學和代碼推理能力

AceReason-Nemotron-14B是一個專注於數學和代碼推理的模型，它基於DeepSeek-R1-Distilled-Qwen-14B，完全通過強化學習（RL）進行訓練。該模型表現出色，在多個基準測試中取得了顯著的成績，如在AIME 2024中達到78.6%（提升8.9%），在AIME 2025中達到67.4%（提升17.4%）等。通過大量實驗，研究團隊系統地研究了RL訓練過程，並提出了一種簡單有效的方法：先對純數學提示進行RL訓練，再對純代碼提示進行RL訓練。研究發現，純數學RL不僅能顯著提升強蒸餾模型在數學基準測試中的表現，還能提升代碼推理任務的性能；而擴展的純代碼RL在進一步提高代碼基準測試性能的同時，對數學結果的影響極小。

Unsloth Dynamic 2.0 實現了卓越的準確性，優於其他領先的量化方法。

main_fig

✨ 主要特性

強化學習訓練：完全基於強化學習進行訓練，從DeepSeek-R1-Distilled-Qwen-14B模型開始，挖掘模型在預訓練和監督微調階段獲得的基礎推理能力，並突破其推理極限。
卓越性能表現：在多個數學和代碼基準測試中取得顯著提升，如AIME 2024、AIME 2025、LiveCodeBench v5和v6等。
有效訓練方法：提出先對純數學提示進行RL訓練，再對純代碼提示進行RL訓練的方法，能有效提升模型在數學和代碼推理任務中的性能。

📦 安裝指南

文檔未提及安裝步驟，暫不提供。

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-14B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

高級用法

文檔未提及高級用法代碼示例，暫不提供。

📚 詳細文檔

結果展示

研究團隊將該模型與Qwen2.5和Llama3.1模型家族中規模相當的競爭推理模型，在AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）和LiveCodeBench v6（2025/02/01 - 2025/05/01）上進行了評估。更多評估結果可在技術報告中查看。

模型	AIME 2024 (avg@64)	AIME 2025 (avg@64)	LCB v5 (avg@8)	LCB v6 (avg@8)
QwQ - 32B	79.5	65.8	63.4	-
DeepSeek - R1 - 671B	79.8	70.0	65.9	-
Llama - Nemotron - Ultra - 253B	80.8	72.5	66.3	-
o3 - mini (medium)	79.6	76.7	67.4	-
Light - R1 - 14B	74	60.2	57.9	51.5
DeepCoder - 14B (32K Inference)	71	56.1	57.9	50.4
OpenMath - Nemotron - 14B	76.3	63.0	-	-
OpenCodeReasoning - Nemotron - 14B	-	-	59.4	54.1
Llama - Nemotron - Super - 49B - v1	67.5	60.0	45.5	-
DeepSeek - R1 - Distilled - Qwen - 14B	69.7	50.2	53.1	47.9
DeepSeek - R1 - Distilled - Qwen - 32B	72.6	54.9	57.2	-
AceReason - Nemotron - 14B 🤖	78.6	67.4	61.1	54.9

使用建議

不要包含系統提示，而是將所有指令直接放在用戶提示中。
對於數學問題，建議使用以下指令：請逐步推理，並將最終答案放在 \boxed{} 內。
對於代碼問題，建議使用以下指令：編寫Python代碼來解決問題。請將解決方案代碼放在以下格式中：

# 你的解決方案代碼

聯繫方式

Yang Chen (yachen@nvidia.com)
Zhuolin Yang (zhuoliny@nvidia.com)
Zihan Liu (zihanl@nvidia.com)
Chankyu Lee (chankyul@nvidia.com)
Wei Ping (wping@nvidia.com)

🔧 技術細節

研究團隊在技術報告中分享了訓練方法和訓練日誌。

📄 許可證

本模型的使用受 NVIDIA Open Model License 約束。

引用格式

@article{acereason2025,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2025}
}