Apriel-Nemotron-15b-Thinkerオープンソース推論モデル - 効率的なメモリ使用、多様なシーンに適しています

ホーム

Apriel Nemotron 15b Thinker GGUF

Mungertによって開発

Apriel-Nemotron-15b-Thinkerは強力な推論モデルで、同規模のモデルの中で優れた性能を発揮し、効率的なメモリ使用と優れた推論能力を備え、さまざまな企業や学術シーンに適しています。

大規模言語モデル

Transformers

オープンソースライセンス:MIT #高速推論 #企業レベルのタスク #数学コンテストレベル

ダウンロード数 1,097

リリース時間 : 6/12/2025

モデル概要

Apriel-Nemotron-15b-Thinkerは効率的な推論モデルで、企業や学術シーンに適しており、優れた推論能力とメモリ効率を持っています。

モデル特徴

メモリ効率が高い

モデルサイズは同類のSOTAモデルの半分で、メモリ使用効率が高いです。

トークン効率が高い

同類のモデルと比較して、トークン消費が40%減少し、本番環境での効率が非常に高いです。

タスクでの性能が優れている

MBPP、BFCL、Enterprise RAG、MT Benchなどのタスクで同等以上の性能を発揮します。

学術ベンチマークでの競争力が強い

AIME - 24、AIME - 25、AMC - 23などの学術ベンチマークで競争力があります。

モデル能力

テキスト生成

論理推論

質問応答

コード生成

関数呼び出し

複雑な指示の遵守

使用事例

企業アプリケーション

コード支援と生成

開発者がコードを生成し、最適化するのを支援します。

開発効率を向上させ、コーディングエラーを減らします。

論理推論と多段階タスク

複雑な論理推論問題を解決します。

正確な推論結果を提供します。

学術研究

数学と科学の問題解決

コンテストレベルの数学と科学の問題を解決します。

AIMEやAMCなどの試験で優れた成績を収めます。

🚀 Apriel-Nemotron-15b-Thinker GGUFモデル

Apriel-Nemotron-15b-Thinker GGUFモデルは、高性能なテキスト生成モデルです。このモデルは、特定の量子化手法を用いて生成されており、低ビット深度でも精度を維持することができます。また、多様なタスクに対応しており、コード生成や論理推論などのタスクでも優れた性能を発揮します。

🚀 クイックスタート

このモデルを使用するには、まずtransformersライブラリをインストールする必要があります。以下のコマンドを実行してインストールしてください。

pip install transformers

✨ 主な機能

量子化手法の改善：標準のIMatrix量子化では低ビット深度で性能が低下する問題を解決するため、重要なレイヤーの精度を手動で上げる量子化手法を採用しています。
多様なタスク対応：コード生成、論理推論、質問応答、関数呼び出しなどの多様なタスクに対応しています。
メモリ効率の向上：同サイズの他のモデルと比較して、メモリ使用量が半分以下です。

📦 インストール

pip install transformers

💻 使用例

基本的な使用法

import re
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"

# トークナイザーとモデルをロード
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# モデル入力を準備
prompt = "Positive real numbers $x$ and $y$ satisfy $y^3=x^2$ and $(y-x)^2=4y^2$. What is $x+y$?\nMark your solution with \\boxed"
messages = [
    {"role": "user", "content": prompt}
]

tools = []

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# テキスト生成を実行
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# 応答を解析
response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
print("output:", output)
print("response:", response)

高度な使用法

from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# モデル入力を準備
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
    {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# ツールの例
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")

使用ガイドライン

モデルのデフォルトチャットテンプレートを使用し、他の指示はユーザーメッセージに追加することをお勧めします。
温度を0.6に設定することをお勧めします。
評価時には、モデルがHere are my reasoning steps:\nで始まることを確認してください。これはデフォルトのチャットテンプレートで実装されています。

📚 ドキュメント

モデル生成詳細

このモデルは、llama.cppを使用して、コミット1f63e75fで生成されました。

量子化手法の改善

標準のIMatrix量子化では、低ビット深度で性能が低下する問題があります。特に、Mixture of Experts (MoE)モデルではこの問題が顕著です。この問題を解決するため、llama.cppの--tensor-typeオプションを使用して、重要なレイヤーの精度を手動で上げています。詳細な実装はこちらを参照してください。
Layer bumping with llama.cpp

この手法により、モデルのファイルサイズは増加しますが、量子化レベルに対する精度が大幅に向上します。

評価

評価は、lm-eval-harnessとevalchemyを使用して行われました。

エンタープライズ能力を示すベンチマーク

image/png

学術的な推論ベンチマーク

image/png

トークン効率の比較（値が低いほど良い）

image/png

トレーニング詳細

途中トレーニング / 継続的事前トレーニング

この段階では、数学的推論、コーディングチャレンジ、科学的議論、論理パズルなどから選ばれた100億以上のトークンを使用して、モデルをトレーニングします。この目的は、モデルの基礎的な推論能力を強化することです。この段階は、モデルが推論器として機能するために非常に重要であり、推論ベンチマークでの性能を大幅に向上させます。

教師付き微調整（SFT）

次に、数学的および科学的な問題解決、コーディングタスク、一般的な命令追従シナリオ、API/関数呼び出しのユースケースなどをカバーする20万件の高品質なデモンストレーションを使用して、モデルをSFTします。

強化学習

SFTで微調整されたチェックポイントは、数学や一般知識などの核心的な能力では強力な性能を発揮しますが、命令追従やコーディングタスクでは弱点を示します。これらのギャップを解決するために、GRPO（目的関数に若干の修正を加えたもの）を適用します。その結果、IFEval、Multi Challenge、Enterprise RAG、MBPP、BFCLなどのベンチマークで大幅な改善が見られ、AIMEやAMCなどの競技レベルの数学試験でのスコアも維持されます。GRPOは、GPQAやMixEvalでも若干の改善をもたらします。

トレーニング中は、SFTとGRPOの両方の段階からの中間スナップショットを定期的にマージし、汎化能力と災害的忘却を改善します。

詳細な技術レポートは近日公開予定です。

意図された使用法

Aprielファミリーのモデルは、以下のような様々な汎用命令タスクに設計されています。

コード支援と生成
論理推論と多段階タスク
質問応答と情報検索
関数呼び出し、複雑な命令追従、エージェントユースケース

ただし、人間の監視なしでの安全上重要なアプリケーションや、事実の正確性が保証されるシナリオでの使用は意図されていません。

制限事項

事実の正確性：誤った、誤解を招く、または古い内容を生成する可能性があります。重要なコンテキストで使用する前に、出力を検証する必要があります。
バイアス：トレーニングデータに含まれる社会的、文化的、または制度的なバイアスを反映する可能性があります。
倫理：有害な、違法な、または非倫理的な内容を生成するためにモデルを使用しないでください。
言語：英語で最も高い性能を発揮します。表現が少ない言語では、出力の品質が低下する可能性があります。
重要な使用：セーフガードなしでの医療、法律、金融、またはその他の高リスクなアプリケーションには適していません。

セキュリティと責任ある使用法

セキュリティ責任

デプロイヤーとユーザーは、EU AI ActやNIST AI Risk Management Framework (RMF)などの既存のフレームワークや規制ガイドラインに沿って、セキュリティ対策を行うことを強く推奨します。

デプロイヤーのガイドライン

定期的にロバスト性評価を実施し、敵対的入力を特定して軽減する。
有害またはバイアスのある出力を防止するための検証とフィルタリングプロセスを実装する。
意図しないデータ漏洩を防ぐために、データプライバシーチェックを継続的に実行する。
モデルの制限事項、意図された使用法、および既知のセキュリティリスクをすべてのエンドユーザーに文書化して伝える。
新たな脅威や脆弱性に対応するために、定期的なセキュリティレビューと更新を行う。

ユーザーのガイドライン

デプロイヤーが提供する既存のセキュリティポリシーと使用ガイドラインに従う。
モデルとのやり取り時に、機密情報を保護して管理する。
異常、疑わしい行動、または不安全な出力をデプロイヤーまたは開発者に報告する。
やり取り中に、潜在的なセキュリティまたは倫理的なリスクを軽減するために、人間の監視と判断を適用する。

免責事項

ユーザーは、このオープンソースのLLMを安全にデプロイ、管理、および使用する責任を負います。モデルは「現状のまま」提供され、セキュリティまたは特定のアプリケーションや環境に対する適合性に関する明示的または暗黙的な保証はありません。

ソフトウェア

トレーニングスタック：Fast-LLM

🔧 技術詳細

技術レポートは近日公開予定です。

📄 ライセンス

MIT

謝辞

Nvidiaの研究者たちが、推論器の構築に関する詳細な洞察とデータを共有してくれたことに感謝します！これにより、我々の研究が大幅に加速しました。我々は、モデルの命名規則でこの貢献を認めています！

引用

@misc{Apriel-nemotron-15b-thinker,  
    author = {Slam labs team},  
    title = {Apriel Nemotron 15b Thinker},  
    howpublished = {https://huggingface.co/ServiceNow-AI/Apriel-Nemotron-15b-Thinker},
    publisher = {SLAM - ServiceNow Language Models Lab}  
    year = {2025}
}

テストについて

テストの目的

このモデルを使用して、AIネットワーク監視における小規模オープンソースモデルの限界を探ります。具体的には、以下の点に焦点を当てています。

ライブネットワークサービスに対する関数呼び出し
以下のタスクを処理しながら、モデルをどれだけ小さくできるか
- 自動化されたNmapセキュリティスキャン
- 量子対応チェック
- ネットワーク監視タスク

テスト可能なモデル

TestLLM（現在の実験モデル）

ゼロコンフィギュレーションセットアップ
30秒以上のロード時間（推論は遅いがAPIコストがかからない）。コストが低いため、トークン制限はありません。
協力を求めています！ エッジデバイスAIに興味がある方は、一緒に協力しましょう！

TurboLLM（gpt-4.1-miniを使用）

非常に良い性能を発揮しますが、残念ながらOpenAIはトークンごとに料金を請求します。そのため、トークンの使用量は制限されています。
量子ネットワークモニターエージェントで.NETコードを実行するためのカスタムコマンドプロセッサを作成する
リアルタイムのネットワーク診断と監視
セキュリティ監査
ペネトレーションテスト（Nmap/Metasploit）

HugLLM（最新のオープンソースモデル）

Hugging Face Inference APIで実行されます。Novitaにホストされている最新のモデルを使用して、かなり良い性能を発揮します。

テストできるコマンドの例

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)" 注：.NETコードを実行するには、量子ネットワークモニターエージェントをインストールする必要があります。これは非常に柔軟で強力な機能です。注意して使用してください！

最後の言葉

これらのモデルファイルを作成するためのサーバー、量子ネットワークモニターサービスを実行するためのサーバー、およびNovitaとOpenAIからの推論料金は、すべて私の私費で負担しています。モデル作成と量子ネットワークモニタープロジェクトの背後にあるすべてのコードはオープンソースです。役に立つものがあれば、自由に使用してください。

もしこの作業を評価していただける場合は、コーヒーを買ってくれることをご検討ください。あなたの支援により、サービスコストを賄い、すべてのユーザーのトークン制限を引き上げることができます。

また、仕事の機会やスポンサーシップも歓迎しています。

ありがとうございます！