AceReason-Nemotron-14Bオープンソースモデル - 数学の問題を解いたり、コードを書いて推論したりするのに超優れており、無料でデプロイ可能！

Home

Acereason Nemotron 14B

Developed by nvidia

AceReason-Nemotron-14Bは強化学習でトレーニングされた数学とコード推論モデルで、DeepSeek-R1-Distilled-Qwen-14Bをベースとしており、数学とコード推論タスクで卓越した性能を発揮します。

大規模言語モデル

Transformers

Open Source License:Other #強化学習推論最適化 #数学とコードの二領域 #高精度問題解決

Downloads 7,863

Release Time : 5/20/2025

Model Overview

AceReason-Nemotron-14Bは完全に強化学習（RL）でトレーニングされた数学とコード推論モデルで、ベースモデルはDeepSeek-R1-Distilled-Qwen-14Bです。このモデルは数学とコード推論タスクで優れた性能を示し、大量のアブレーション実験を通じてRLトレーニングプロセスを体系的に研究し、純粋な数学プロンプトでRLトレーニングを行った後、純粋なコードプロンプトでRLトレーニングを行うというシンプルで効果的な方法を提案しています。

Model Features

強化学習トレーニング

完全に強化学習（RL）でトレーニングされており、数学とコード推論能力が大幅に向上しています。

段階的トレーニング手法

最初に純粋な数学プロンプトでRLトレーニングを行い、次に純粋なコードプロンプトでRLトレーニングを行うことで、モデル性能を最適化します。

高性能推論

AIME 2024、AIME 2025、LiveCodeBenchなどのベンチマークテストで卓越した性能を発揮します。

Model Capabilities

数学推論

コード生成

テキスト生成

強化学習

Use Cases

数学推論

数学競技問題解答

AIME 2024やAIME 2025の問題など、複雑な数学競技問題を解決します。

AIME 2024で78.6%（8.9%向上）、AIME 2025で67.4%（17.4%向上）を達成。

コード生成

コード競技問題解答

コード競技問題を解決するPythonコードを生成します。

LiveCodeBench v5で61.1%（8%向上）、LiveCodeBench v6で54.9%（7%向上）を達成。

🚀 AceReason-Nemotron: 強化学習による数学とコード推論の進化

AceReason-Nemotron-14Bは、DeepSeek-R1-Distilled-Qwen-14Bをベースに、完全に強化学習（RL）を用いて訓練された数学とコード推論モデルです。このモデルは、AIME 2024で78.6%（+8.9%）、AIME 2025で67.4%（+17.4%）、LiveCodeBench v5で61.1%（+8%）、LiveCodeBench v6で54.9%（+7%）、Codeforcesで2024（+543）という優れた結果を達成しています。我々は、広範なアブレーション研究を通じてRL訓練プロセスを体系的に研究し、数学のみのプロンプトでのRL訓練を行った後、コードのみのプロンプトでのRL訓練を行うという、シンプルで効果的なアプローチを提案しています。特に、数学のみのRLは、強力な蒸留モデルの数学ベンチマークでの性能を大幅に向上させるだけでなく、コード推論タスクにも大きな効果があることがわかりました。また、拡張されたコードのみのRLは、コードベンチマークの性能をさらに向上させると同時に、数学の結果の低下を最小限に抑えます。RLは、事前学習と教師あり微調整（例えば、蒸留）の過程で獲得された基礎的な推論能力を引き出すだけでなく、モデルの推論能力の限界を押し広げ、以前は解けなかった問題を解くことを可能にします。

我々は、訓練レシピや訓練ログを技術レポートで公開しています。

📊 結果

我々は、AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）、およびLiveCodeBench v6（2025/02/01 - 2025/05/01）で、Qwen2.5およびLlama3.1モデルファミリー内の同等のサイズの競合する推論モデルと比較して、我々のモデルを評価しています。より詳細な評価結果は、技術レポートで確認できます。

モデル	AIME 2024 (avg@64)	AIME 2025 (avg@64)	LCB v5 (avg@8)	LCB v6 (avg@8)
QwQ-32B	79.5	65.8	63.4	-
DeepSeek-R1-671B	79.8	70.0	65.9	-
Llama-Nemotron-Ultra-253B	80.8	72.5	66.3	-
o3-mini (medium)	79.6	76.7	67.4	-
Light-R1-14B	74	60.2	57.9	51.5
DeepCoder-14B (32K Inference)	71	56.1	57.9	50.4
OpenMath-Nemotron-14B	76.3	63.0	-	-
OpenCodeReasoning-Nemotron-14B	-	-	59.4	54.1
Llama-Nemotron-Super-49B-v1	67.5	60.0	45.5	-
DeepSeek-R1-Distilled-Qwen-14B	69.7	50.2	53.1	47.9
DeepSeek-R1-Distilled-Qwen-32B	72.6	54.9	57.2	-
AceReason-Nemotron-14B ðŸ¤—	78.6	67.4	61.1	54.9

💻 使用例

基本的な使用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-14B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

💡 使用アドバイス

システムプロンプトを含めず、すべての指示を直接ユーザープロンプトに記載してください。
数学の質問には、以下の指示を使用することをおすすめします：Please reason step by step, and put your final answer within \boxed{}.
コードの質問には、以下の指示を使用することをおすすめします：

question = "" # code question
starter_code = "" # starter code function header

code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
if starter_code != "":
    question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
else:
    question += "\n\n" + code_instruction_nostartercode

final_prompt = "<ï½œUserï½œ>" + question + "<ï½œAssistantï½œ><think>\n"

評価用の推論エンジンは、top-p=0.95、temperature=0.6、max_tokens=32768を使用したvLLM==0.7.3です。
数学の評価にはAceMath scorerを、コードの評価にはLiveCodeBench official scriptを使用しています。

📧 問い合わせ先

Yang Chen (yachen@nvidia.com), Zhuolin Yang (zhuoliny@nvidia.com), Zihan Liu (zihanl@nvidia.com), Chankyu Lee (chankyul@nvidia.com), Wei Ping (wping@nvidia.com)

📄 ライセンス

このモデルの使用は、NVIDIA Open Model Licenseに準拠しています。

📖 引用

@article{acereason2025,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2025}
}