Apriel Nemotron 15b Thinker

ServiceNow-AIによって開発

ServiceNowが提供する150億パラメータの効率的な推論モデルで、メモリ使用量は同クラスの先進モデルの半分のみ

大規模言語モデル

Transformers

オープンソースライセンス:MIT #効率的な推論 #エンタープライズタスク #低リソース消費

ダウンロード数 1,252

リリース時間 : 5/6/2025

モデル概要

Apriel-15b-baseをベースにした3段階トレーニングモデルで、効率的な推論とエンタープライズタスク向けに最適化

モデル特徴

効率的なメモリ使用

サイズは同類の32Bモデルの半分で、メモリ効率が大幅に向上

推論効率の最適化

同類モデルと比較して40%のトークン消費を削減、本番環境での効率性が高い

エンタープライズタスクの最適化

MBPP、BFCL、企業RAGなどのタスクで優れたパフォーマンス

学術的競争力

AIME、AMC、MATHなどの学術ベンチマークで競争力のあるパフォーマンス

モデル能力

テキスト生成

複雑な推論

エンタープライズタスク処理

学術的問題解決

使用事例

企業アプリケーション

企業RAGシステム

企業の知識検索と生成タスクに使用

関連ベンチマークテストで優れたパフォーマンス

ビジネスプロセス自動化

企業レベルのドキュメントとプロセス自動化タスクを処理

学術研究

数学問題解決

AMC、AIMEなどの数学競技レベルの問題を解決

MATH-500などのベンチマークで良好なパフォーマンス

license: mit pipeline_tag: text-generation library_name: transformers

Apriel-Nemotron-15b-Thinker

サムネイル /Àà…ëÀê.pri.…ôl/

概要

Apriel-Nemotron-15b-Thinker は、ServiceNowのApriel SLMシリーズに属する150億パラメータの推論モデルで、o1-mini、QWQ-32b、EXAONE-Deep-32bなどの同規模の最先端モデルと競合する性能を発揮しながら、メモリ使用量はそれらの半分に抑えています。このモデルは、Apriel-15b-base チェックポイントを基に、3段階のトレーニングパイプライン（CPT、SFT、GRPO）を経て構築されました。

特徴

QWQ-32bやEXAONE-32bなどの最先端モデルの半分のサイズで、メモリ効率が高い。
QWQ-32bと比べて**40%**少ないトークンを消費するため、運用効率が非常に高い。
MBPP、BFCL、Enterprise RAG、MT Bench、MixEval、IFEval、Multi-Challengeなどのタスクで同等または優れた性能を発揮し、エージェント/エンタープライズタスクに最適。
モデルサイズを考慮しても、AIME-24、AIME-25、AMC-23、MATH-500、GPQAなどの学術ベンチマークで競争力のある性能を発揮。

評価

評価は lm-eval-harness と evalchemy を使用して実施されました。

エンタープライズ能力を示すベンチマーク

image/png

学術的推論ベンチマーク

image/png

トークン効率比較（低いほど良い）

image/png

トレーニング詳細

中間トレーニング / 継続事前トレーニング この段階では、数学的推論、コーディング課題、科学的議論、論理パズルから慎重に選ばれた1000億以上のトークンでモデルをトレーニングします。目的は、モデルの基礎的な推論能力を強化することです。この段階は、モデルが推論者として機能するために非常に重要であり、推論ベンチマークで大幅な向上をもたらします。

教師ありファインチューニング (SFT) 次に、数学的・科学的問題解決、コーディングタスク、一般的な指示追従シナリオ、API/関数呼び出しユースケースなどをカバーする20万件の高品質なデモンストレーションを使用してモデルをSFTします。

強化学習 SFTチューニングされたチェックポイントは、数学や一般知識などのコア能力で強力な性能を発揮しますが、指示追従やコーディングタスクでは弱点があります。これらのギャップを埋めるため、GRPO（目的関数に若干の変更を加えた）を適用します。その結果、IFEval、Multi Challenge、Enterprise RAG、MBPP、BFCLなどのベンチマークで大幅な改善が見られ、AIMEやAMCなどの競争レベルの数学試験でのスコアは維持されます。GRPOは、GPQAとMixEvalでもわずかな向上をもたらします。

トレーニング全体を通じて、SFTとGRPOの段階からの中間スナップショットを定期的にマージし、汎化と破滅的忘却を改善します。

詳細な技術レポートは近日公開予定。

使用方法

pip install transformers

推論モデルの実行

以下は、transformersライブラリのgenerate関数を使用したモデルの使用例です：

import re
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"

# トークナイザーとモデルをロード
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# モデル入力の準備
prompt = "Positive real numbers $x$ and $y$ satisfy $y^3=x^2$ and $(y-x)^2=4y^2$. What is $x+y$?\nMark your solution with \\boxed"
messages = [
    {"role": "user", "content": prompt}
]

tools = []

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# テキスト補完を実行
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# レスポンスの解析
response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
print("output:", output)
print("response:", response)

チャットテンプレート

<|system|>
You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE].
<|end|>
<|user|>
# ユーザーメッセージをここに
<|end|>
<|assistant|>
Here are my reasoning steps:
# 思考プロセスをここに
[BEGIN FINAL RESPONSE]
# アシスタントのレスポンスをここに
[END FINAL RESPONSE]
<|end|>

モデルは最初に思考プロセスを生成し、その後 [BEGIN FINAL RESPONSE] と [END FINAL RESPONSE] の間に最終レスポンスを生成します。以下はチャットテンプレートの適用例です：

from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# モデル入力の準備
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
    {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# ツールの例
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")

使用ガイドライン

モデルのデフォルトチャットテンプレートを使用してください。これにはすでにシステムプロンプトが含まれています。他の指示はすべてユーザーメッセージ内に追加することを推奨します。
温度は 0.6 に設定することを推奨します。
すべての評価で、モデルが Here are my reasoning steps:\n で開始することを保証します。これはデフォルトのチャットテンプレートに実装されています。

想定用途

Aprielモデルファミリーは、以下のようなさまざまな汎用指示タスク向けに設計されています：

コードアシスタンスと生成
論理的推論と多段階タスク
質問応答と情報検索
関数呼び出し、複雑な指示追従、エージェントユースケース

意図されていない用途 としては、人間の監視なしの安全クリティカルなアプリケーションや、事実の正確性が保証される必要のあるシナリオでの使用が挙げられます。

制限事項

事実の正確性: 誤った、誤解を招く、または時代遅れのコンテンツを生成する可能性があります。重要な文脈で使用する前に出力を検証してください。
バイアス: トレーニングデータに存在する社会的、文化的、またはシステム的なバイアスを反映する可能性があります。
倫理: 有害、違法、または非倫理的なコンテンツを生成するためにモデルを使用しないでください。
言語: 英語での性能が最も強力です。サポートされていない言語では出力品質が低下する可能性があります。
クリティカルな用途: 医療、法律、金融などの高リスクアプリケーションには、安全対策なしでは適していません。

セキュリティと責任ある使用

セキュリティ責任:
展開者とユーザーは、EU AI法やNIST AIリスク管理フレームワーク（RMF）などの確立されたフレームワークや規制ガイドラインにセキュリティプラクティスを合わせることを強く推奨します。

展開者向けガイドライン:

敵対的入力を特定し緩和するために、定期的に堅牢性評価を実施してください。
有害または偏った出力を防ぐために、検証とフィルタリングプロセスを実装してください。
意図しないデータ漏洩を防ぐために、データプライバシーチェックを継続的に実施してください。
モデルの制限事項、想定用途、既知のセキュリティリスクをすべてのエンドユーザーに文書化して伝達してください。
新たな脅威や脆弱性に対処するために、定期的なセキュリティレビューと更新をスケジュールしてください。

ユーザー向けガイドライン:

展開者から提供されたセキュリティポリシーと使用ガイドラインに従ってください。
モデルとのやり取り中に機密情報を保護し管理してください。
異常、不審な動作、または安全でない出力を展開者または開発者に報告してください。
やり取り中の潜在的なセキュリティまたは倫理リスクを軽減するために、人間の監視と判断を維持してください。

免責事項:
ユーザーは、このオープンソースLLMを安全に展開、管理、使用する責任を負います。モデルは「現状のまま」提供され、特定のアプリケーションや環境に対するセキュリティや適合性について明示的または黙示的な保証はありません。

ソフトウェア

トレーニングスタック: Fast-LLM

ライセンス

MIT

謝辞

推論モデルの構築に関する詳細な洞察とデータを共有してくださったNvidiaの研究者に感謝します！これにより私たちの研究が大幅に加速し、モデルの命名規則にもその感謝の意を表しています。

引用

@misc{Apriel-nemotron-15b-thinker,  
    author = {Slam labs team},  
    title = {Apriel Nemotron 15b Thinker},  
    howpublished = {https://huggingface.co/ServiceNow-AI/Apriel-Nemotron-15b-Thinker},
    publisher = {SLAM - ServiceNow Language Models Lab}  
    year = {2025}
}