AceReason-Nemotron-7Bオープンソースモデル - 無料で数学とコード推論の難問を解決！

ホーム

Acereason Nemotron 7B

nvidiaによって開発

強化学習でトレーニングされた数学とコード推論モデル、DeepSeek-R1-Distilled-Qwen-7Bをベースに、数学とコード推論タスクで優れたパフォーマンスを発揮

大規模言語モデル

Transformers

オープンソースライセンス:その他 #数学推論強化 #コード生成最適化 #RLトレーニング突破

ダウンロード数 4,278

リリース時間 : 5/22/2025

モデル概要

AceReason-Nemotron-7Bは完全に強化学習(RL)でトレーニングされた数学とコード推論モデルで、ベースモデルはDeepSeek-R1-Distilled-Qwen-7Bです。このモデルは数学とコード推論タスクで顕著な向上を達成しました。

モデル特徴

強化学習トレーニング

完全に強化学習(RL)でトレーニングされ、数学とコード推論能力が顕著に向上

数学推論能力

AIME 2024で69.0%（14.5%向上）、AIME 2025で53.6%（17.4%向上）を達成

コード推論能力

LiveCodeBench v5で51.8%（8%向上）、LiveCodeBench v6で44.1%（7%向上）を達成

トレーニング方法の革新

純粋な数学プロンプトでRLトレーニングを行い、その後純粋なコードプロンプトでRLトレーニングを行うことで、効果が顕著

モデル能力

数学推論

コード生成

複雑な問題解決

段階的推論

使用事例

数学競技

AIME数学競技問題解答

AIME数学競技の複雑な問題を解決

AIME 2024で69.0%の精度を達成

プログラミング競技

LiveCodeBenchプログラミング問題解答

LiveCodeBenchのプログラミング問題を解決

LiveCodeBench v5で51.8%の精度を達成

教育支援

数学学習支援

学生が複雑な数学概念と解法を理解するのを支援

🚀 AceReason-Nemotron: 強化学習による数学とコード推論の進化

AceReason-Nemotron-7Bは、DeepSeek-R1-Distilled-Qwen-7Bをベースに、完全に強化学習（RL）を通じて訓練された数学とコードの推論モデルです。このモデルは、AIME 2024で69.0%（+14.5%）、AIME 2025で53.6%（+17.4%）、LiveCodeBench v5で51.8%（+8%）、LiveCodeBench v6で44.1%（+7%）という印象的な結果を達成しています。我々は、広範なアブレーション研究を通じてRLの訓練プロセスを体系的に調査し、数学のみのプロンプトでのRL訓練を行い、その後コードのみのプロンプトでのRL訓練を行うという、シンプルで効果的なアプローチを提案しています。特に、数学のみのRLは、強力な蒸留モデルの数学ベンチマークでのパフォーマンスを大幅に向上させるだけでなく、コード推論タスクにも有効であることがわかりました。さらに、コードのみの拡張RLは、コードベンチマークのパフォーマンスをさらに向上させる一方で、数学の結果の低下を最小限に抑えます。我々は、RLが事前学習と教師あり微調整（例：蒸留）中に獲得された基礎的な推論能力を引き出すだけでなく、モデルの推論能力の限界を押し広げ、以前は解決できなかった問題を解決することができることを見出しました。

我々は、技術レポートで訓練方法と訓練ログを公開しています。

🚀 クイックスタート

このセクションでは、AceReason-Nemotronモデルの基本的な使い方を説明します。

✨ 主な機能

強化学習を用いた数学とコードの推論モデル。
AIMEやLiveCodeBenchなどのベンチマークで高いパフォーマンスを達成。
数学のみのRLとコードのみのRLを組み合わせた訓練方法。

📦 インストール

このモデルを使用するには、transformersライブラリが必要です。以下のコマンドでインストールできます。

pip install transformers

💻 使用例

基本的な使用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'nvidia/AceReason-Nemotron-7B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

📚 ドキュメント

結果

我々は、AIME 2024、AIME 2025、LiveCodeBench v5（2024/08/01 - 2025/02/01）、およびLiveCodeBench v6（2025/02/01 - 2025/05/01）で、Qwen2.5およびLlama3.1モデルファミリー内の同等のサイズの競合する推論モデルと比較して、我々のモデルを評価しました。より詳細な評価結果は、我々の技術レポートで確認できます。

モデル	AIME 2024 (avg@64)	AIME 2025 (avg@64)	LiveCodeBench v5 (avg@8)	LiveCodeBench v6 (avg@8)
QwQ-32B	79.5	65.8	63.4	-
DeepSeek-R1-671B	79.8	70.0	65.9	-
Llama-Nemotron-Ultra-253B	80.8	72.5	66.3	-
o3-mini (medium)	79.6	76.7	67.4	-
Light-R1-7B	59.1	44.3	40.6	36.4
Light-R1-14B	74	60.2	57.9	51.5
DeepCoder-14B (32K Inference)	71	56.1	57.9	50.4
OpenMath-Nemotron-7B	74.8	61.2	-	-
OpenCodeReasoning-Nemotron-7B	-	-	51.3	46.1
Llama-Nemotron-Nano-8B-v1	61.3	47.1	46.6	46.2
DeepSeek-R1-Distilled-Qwen-7B	55.5	39.0	37.6	34.1
DeepSeek-R1-Distilled-Qwen-14B	69.7	50.2	53.1	47.9
DeepSeek-R1-Distilled-Qwen-32B	72.6	54.9	57.2	-
AceReason-Nemotron-7B ðŸ¤—	69.0	53.6	51.8	44.1
AceReason-Nemotron-14B ðŸ¤—	78.6	67.4	61.1	54.9

使用上の推奨事項

システムプロンプトを含めず、すべての指示を直接ユーザープロンプトに記載してください。
数学の質問には、以下の指示を使用することをお勧めします：Please reason step by step, and put your final answer within \boxed{}.
コードの質問には、以下の指示を使用することをお勧めします：

question = "" # コードの質問
starter_code = "" # スターターコードの関数ヘッダー

code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
if starter_code != "":
    question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
else:
    question += "\n\n" + code_instruction_nostartercode

final_prompt = "<ï½œUserï½œ>" + question + "<ï½œAssistantï½œ><think>\n"

評価用の推論エンジンは vLLM==0.7.3 を使用し、top-p=0.95、temperature=0.6、max_tokens=32768です。
数学の評価には AceMath scorer を、コードの評価には LiveCodeBench official script を使用しています。

📄 ライセンス

このモデルの使用は、NVIDIA Open Model Licenseに従います。

引用

@article{acereason2025,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2025}
}