AceReason-Nemotron-14B開源模型 - 做數學題、寫代碼推理超厲害，免費部署！

首頁

Acereason Nemotron 14B

由nvidia開發

AceReason-Nemotron-14B是一款通過強化學習訓練的數學與代碼推理模型，基於DeepSeek-R1-Distilled-Qwen-14B，在數學和代碼推理任務上表現卓越。

大型語言模型

Transformers

開源協議:其他 #強化學習推理優化 #數學代碼雙領域 #高精度解題

下載量 7,863

發布時間 : 5/20/2025

模型概述

AceReason-Nemotron-14B是一款完全通過強化學習（RL）訓練的數學與代碼推理模型，其基礎模型為DeepSeek-R1-Distilled-Qwen-14B。該模型在數學和代碼推理任務上表現卓越，通過大量消融實驗系統研究了RL訓練過程，並提出了一種簡單有效的方法：先對純數學提示進行RL訓練，再對純代碼提示進行RL訓練。

模型特點

強化學習訓練

完全通過強化學習（RL）訓練，顯著提升數學和代碼推理能力。

分階段訓練方法

先對純數學提示進行RL訓練，再對純代碼提示進行RL訓練，優化模型性能。

高性能推理

在AIME 2024、AIME 2025、LiveCodeBench等基準測試中表現卓越。

模型能力

數學推理

代碼生成

文本生成

強化學習

使用案例

數學推理

數學競賽問題解答

解決複雜的數學競賽問題，如AIME 2024和AIME 2025中的題目。

在AIME 2024上達到78.6%（提升8.9%），AIME 2025上67.4%（提升17.4%）。

代碼生成

代碼競賽問題解答

生成解決代碼競賽問題的Python代碼。

在LiveCodeBench v5上61.1%（提升8%），LiveCodeBench v6上54.9%（提升7%）。

🚀 AceReason-Nemotron：通過強化學習提升數學與代碼推理能力

AceReason-Nemotron-14B 是一款基於強化學習（RL）訓練的數學與代碼推理模型，它以 DeepSeek-R1-Distilled-Qwen-14B 為基礎進行訓練。該模型表現出色，在 AIME 2024 測試中達到 78.6%（提升 8.9%），在 AIME 2025 測試中達到 67.4%（提升 17.4%），在 LiveCodeBench v5 測試中達到 61.1%（提升 8%），在 LiveCodeBench v6 測試中達到 54.9%（提升 7%），在 2024 年 Codeforces 測試中提升 543 分。通過大量實驗，我們系統地研究了強化學習的訓練過程，並提出了一種簡單有效的方法：先對純數學提示進行強化學習訓練，再對純代碼提示進行強化學習訓練。值得注意的是，僅針對數學的強化學習不僅顯著提升了強大的蒸餾模型在數學基準測試中的性能，還提升了代碼推理任務的性能。此外，擴展的僅針對代碼的強化學習進一步提高了代碼基準測試的性能，同時對數學測試結果的影響極小。我們發現，強化學習不僅能激發模型在預訓練和監督微調（如蒸餾）過程中獲得的基礎推理能力，還能突破模型推理能力的極限，使其能夠解決以前無法解決的問題。

我們在技術報告中分享了訓練方法和訓練日誌。

✨ 主要特性

強化學習訓練：完全通過強化學習進行訓練，提升模型的推理能力。
多領域表現出色：在數學和代碼推理任務中均取得了優異的成績。
有效訓練方法：提出先數學後代碼的強化學習訓練方法。

📦 安裝指南

文檔未提供安裝步驟，跳過該章節。

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-14B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

📚 詳細文檔

結果展示

我們在 AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）和 LiveCodeBench v6（2025/02/01 - 2025/05/01）上，將我們的模型與 Qwen2.5 和 Llama3.1 模型家族中規模相當的競爭推理模型進行了評估。更多評估結果可在技術報告中找到。

模型	AIME 2024 (avg@64)	AIME 2025 (avg@64)	LCB v5 (avg@8)	LCB v6 (avg@8)
QwQ - 32B	79.5	65.8	63.4	-
DeepSeek - R1 - 671B	79.8	70.0	65.9	-
Llama - Nemotron - Ultra - 253B	80.8	72.5	66.3	-
o3 - mini (medium)	79.6	76.7	67.4	-
Light - R1 - 14B	74	60.2	57.9	51.5
DeepCoder - 14B (32K Inference)	71	56.1	57.9	50.4
OpenMath - Nemotron - 14B	76.3	63.0	-	-
OpenCodeReasoning - Nemotron - 14B	-	-	59.4	54.1
Llama - Nemotron - Super - 49B - v1	67.5	60.0	45.5	-
DeepSeek - R1 - Distilled - Qwen - 14B	69.7	50.2	53.1	47.9
DeepSeek - R1 - Distilled - Qwen - 32B	72.6	54.9	57.2	-
AceReason - Nemotron - 14B 🤖	78.6	67.4	61.1	54.9

使用建議

不要包含系統提示，而是將所有指令直接放在用戶提示中。
對於數學問題，建議使用以下指令：請逐步推理，並將最終答案放在 \boxed{} 內。
對於代碼問題，建議使用以下指令：

question = "" # 代碼問題
starter_code = "" # 起始代碼函數頭

code_instruction_nostartercode = """編寫 Python 代碼來解決問題。請將解決方案代碼放在以下格式中：
```python
# 你的解決方案代碼
```"""
code_instruction_hasstartercode = """請將解決方案代碼放在以下格式中：
```python
# 你的解決方案代碼
```"""
if starter_code != "":
    question += "\n\n" + "從提供的函數頭開始解決問題。\n\n函數頭：\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
else:
    question += "\n\n" + code_instruction_nostartercode

final_prompt = "<ï½œUserï½œ>" + question + "<ï½œAssistantï½œ><think>\n"

我們用於評估的推理引擎是 vLLM==0.7.3，使用 top - p = 0.95，temperature = 0.6，max_tokens = 32768。
我們使用 AceMath 評分器進行數學評估，使用 LiveCodeBench 官方腳本進行代碼評估。

聯繫方式

Yang Chen (yachen@nvidia.com)
Zhuolin Yang (zhuoliny@nvidia.com)
Zihan Liu (zihanl@nvidia.com)
Chankyu Lee (chankyul@nvidia.com)
Wei Ping (wping@nvidia.com)

📄 許可證

你對該模型的使用受 NVIDIA 開放模型許可證約束。

🔧 技術細節

文檔未提供技術細節，跳過該章節。

📖 引用格式

@article{acereason2025,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2025}
}