Apriel-Nemotron-15b-Thinker開源推理模型 - 高效內存使用，適用於多場景

首頁

Apriel Nemotron 15b Thinker GGUF

由Mungert開發

Apriel-Nemotron-15b-Thinker是一款強大的推理模型，在同規模模型中表現出色，具有高效的內存使用和優秀的推理能力，適用於多種企業和學術場景。

大型語言模型

Transformers

開源協議:MIT #高效推理 #企業級任務 #數學競賽級

下載量 1,097

發布時間 : 6/12/2025

模型概述

Apriel-Nemotron-15b-Thinker是一款高效的推理模型，適用於企業和學術場景，具有出色的推理能力和內存效率。

模型特點

內存高效

模型大小僅為同類SOTA模型的一半，內存使用效率高。

令牌高效

與同類模型相比，消耗的令牌減少40%，在生產環境中效率極高。

任務表現出色

在MBPP、BFCL、Enterprise RAG、MT Bench等任務上表現相當或更優。

學術基準競爭力強

在AIME-24、AIME-25、AMC-23等學術基準上具有競爭力。

模型能力

文本生成

邏輯推理

問答

代碼生成

函數調用

複雜指令遵循

使用案例

企業應用

代碼協助和生成

幫助開發人員生成和優化代碼。

提高開發效率，減少編碼錯誤。

邏輯推理和多步驟任務

解決複雜的邏輯推理問題。

提供準確的推理結果。

學術研究

數學和科學問題解決

解決競賽級數學和科學問題。

在AIME和AMC等考試中表現優異。

🚀 Apriel-Nemotron-15b-Thinker GGUF模型

Apriel-Nemotron-15b-Thinker GGUF模型是一款強大的推理模型，在同規模模型中表現出色，具有高效的內存使用和優秀的推理能力，適用於多種企業和學術場景。

🚀 快速開始

安裝依賴

pip install transformers

運行推理模型

以下是使用transformers庫的generate函數調用該模型的代碼示例：

import re
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"

# 加載分詞器和模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# 準備模型輸入
prompt = "Positive real numbers $x$ and $y$ satisfy $y^3=x^2$ and $(y-x)^2=4y^2$. What is $x+y$?\nMark your solution with \\boxed"
messages = [
    {"role": "user", "content": prompt}
]

tools = []

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 進行文本生成
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# 解析輸出
response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
print("output:", output)
print("response:", response)

聊天模板

<|system|>
You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE].
<|end|>
<|user|>
# user message here
<|end|>
<|assistant|>
Here are my reasoning steps:
# thoughts here
[BEGIN FINAL RESPONSE]
# assistant response here
[END FINAL RESPONSE]
<|end|>

以下是應用聊天模板的代碼示例：

from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 準備模型輸入
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
    {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# 示例工具
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")

使用指南

使用模型的默認聊天模板，其中已包含系統提示。建議將所有其他指令添加到用戶消息中。
建議將溫度設置為0.6。
在所有評估中，確保模型以Here are my reasoning steps:\n開頭。這已在默認聊天模板中實現。

✨ 主要特性

內存高效：模型大小僅為QWQ - 32b和EXAONE - 32b等SOTA模型的一半，內存使用效率高。
令牌高效：與QWQ - 32b相比，消耗的令牌減少40%，在生產環境中效率極高。
任務表現出色：在MBPP、BFCL、Enterprise RAG、MT Bench、MixEval、IFEval和Multi - Challenge等任務上表現相當或更優，適合代理/企業任務。
學術基準競爭力強：考慮到模型大小，在AIME - 24、AIME - 25、AMC - 23、MATH - 500和GPQA等學術基準上具有競爭力。

📦 安裝指南

pip install transformers

📚 詳細文檔

模型生成詳情

該模型使用llama.cpp在提交版本1f63e75f時生成。

超越IMatrix的量化

我一直在試驗一種新的量化方法，該方法有選擇地提高關鍵層的精度，超出了默認IMatrix配置的範圍。在測試中，標準IMatrix量化在低比特深度下表現不佳，尤其是在專家混合（MoE）模型中。為了解決這個問題，我使用llama.cpp中的--tensor-type選項手動將重要層的精度提升。你可以在以下鏈接查看實現：使用llama.cpp進行層提升。雖然這會增加模型文件的大小，但顯著提高了給定量化級別的精度。

評估

使用[lm - eval - harness](https://github.com/EleutherAI/lm - evaluation - harness)和evalchemy進行評估。

體現企業能力的基準測試：
學術推理基準測試：
令牌效率比較（越低越好）：

訓練詳情

中期訓練/持續預訓練：在此階段，模型在精心挑選的超過1000億個令牌的示例上進行訓練，這些示例來自數學推理、編碼挑戰、科學論述和邏輯謎題。目標是增強模型的基礎推理能力。這一階段對模型作為推理器的功能至關重要，並在推理基準測試中帶來顯著提升。
監督微調（SFT）：接下來，使用200,000個高質量的演示對模型進行SFT，這些演示涵蓋數學和科學問題解決、編碼任務、通用指令遵循場景、API/函數調用用例等。
強化學習：儘管SFT調整後的檢查點在數學和常識等核心能力上表現出色，但在指令遵循和編碼任務上存在弱點。為了解決這些問題，應用GRPO（對目標進行了一些小修改）。結果是在IFEval、Multi Challenge、Enterprise RAG、MBPP和BFCL等基準測試中顯著改進，同時保留了AIME和AMC等競賽級數學考試的分數。GRPO在GPQA和MixEval上也有適度提升。在整個訓練過程中，定期合併SFT和GRPO階段的中間快照，提高了泛化能力並減少災難性遺忘。

預期用途

Apriel系列模型設計用於各種通用指令任務，包括：

代碼協助和生成
邏輯推理和多步驟任務
問答和信息檢索
函數調用、複雜指令遵循和代理用例

它們不適合在沒有人工監督的安全關鍵應用程序中使用，也不適合需要保證事實準確性的場景。

侷限性

事實準確性：可能產生不正確、誤導性或過時的內容。在關鍵上下文中使用輸出之前，應進行驗證。
偏差：可能反映訓練數據中存在的社會、文化或系統性偏差。
倫理問題：不要使用模型生成有害、非法或不道德的內容。
語言：在英語中表現最強。在代表性不足的語言中，輸出質量可能會下降。
關鍵用途：在沒有保障措施的情況下，不適合用於醫療、法律、金融或其他高風險應用。

安全和負責任使用

安全責任

部署者和用戶強烈建議使其安全實踐與既定框架和監管指南（如歐盟AI法案和NIST AI風險管理框架（RMF））保持一致。

部署者指南

定期進行魯棒性評估，以識別和減輕對抗性輸入。
實施驗證和過濾流程，以防止有害或有偏差的輸出。
持續進行數據隱私檢查，以防止意外數據洩露。
向所有最終用戶記錄並傳達模型的侷限性、預期用途和已知安全風險。
定期安排安全審查和更新，以應對新興威脅和漏洞。

用戶指南

遵循部署者提供的既定安全政策和使用指南。
在與模型交互時保護和管理敏感信息。
向部署者或開發者報告異常、可疑行為或不安全的輸出。
在交互過程中保持人工監督並運用判斷力，以減輕潛在的安全或倫理風險。

免責聲明

用戶承擔安全部署、管理和使用此開源大語言模型的責任。模型按“原樣”提供，不提供關於安全或適用於任何特定應用程序或環境的明確或暗示保證。

軟件

訓練棧：[Fast - LLM](https://github.com/ServiceNow/Fast - LLM)

許可證

MIT

致謝

感謝英偉達的研究人員分享他們在構建推理器方面的詳細見解和數據！這極大地加速了我們的研究，我們通過模型命名慣例來認可這一點！

引用

@misc{Apriel-nemotron-15b-thinker,  
    author = {Slam labs team},  
    title = {Apriel Nemotron 15b Thinker},  
    howpublished = {https://huggingface.co/ServiceNow-AI/Apriel-Nemotron-15b-Thinker},
    publisher = {SLAM - ServiceNow Language Models Lab}  
    year = {2025}
}

量子網絡監控測試

如果你發現這些模型有用，請幫助我測試我的AI驅動的量子網絡監控助手，進行量子就緒安全檢查：量子網絡監控。量子網絡監控服務的完整開源代碼可在我的github倉庫（名稱中包含NetworkMonitor的倉庫）中找到：量子網絡監控源代碼。你還可以找到我用於量化模型的代碼，如果你想自己進行量化：GGUFModelBuilder。

測試方法

選擇一種AI助手類型：

TurboLLM (GPT - 4.1 - mini)
HugLLM (Hugginface開源模型)
TestLLM (僅實驗性CPU版本)

測試內容

我正在挑戰小開源模型在AI網絡監控方面的極限，具體包括：

針對即時網絡服務進行函數調用
探索模型在處理以下任務時可以達到的最小規模：
- 自動Nmap安全掃描
- 量子就緒檢查
- 網絡監控任務

TestLLM - 當前實驗模型（在huggingface docker空間的2個CPU線程上運行llama.cpp）

零配置設置
加載時間約30秒（推理速度慢，但無API成本）。由於成本低，無令牌限制。
尋求幫助：如果你對邊緣設備AI感興趣，讓我們合作！

其他助手

TurboLLM - 使用gpt - 4.1 - mini：
- 表現非常好，但不幸的是OpenAI按令牌收費。因此，令牌使用受限。
- 創建自定義cmd處理器，在量子網絡監控代理上運行.net代碼。
- 即時網絡診斷和監控
- 安全審計
- 滲透測試（Nmap/Metasploit）
HugLLM - 最新的開源模型：在Hugging Face推理API上運行。使用Novita託管的最新模型表現相當不錯。

示例測試命令

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)" 注意，你需要安裝量子網絡監控代理才能運行.net代碼。這是一個非常靈活和強大的功能，請謹慎使用！

最後說明

我自掏腰包為創建這些模型文件的服務器、運行量子網絡監控服務以及支付Novita和OpenAI的推理費用提供資金。模型創建和量子網絡監控項目背後的所有代碼都是開源的。你可以自由使用任何你認為有用的代碼。如果你欣賞這項工作，請考慮請我喝咖啡。你的支持有助於支付服務成本，並允許我為大家提高令牌限制。我也歡迎工作機會或贊助。感謝！

🔧 技術細節

在模型訓練過程中，採用了三階段訓練管道（CPT、SFT和GRPO）。在中期訓練/持續預訓練階段，使用超過1000億個精心挑選的令牌進行訓練，以增強基礎推理能力。監督微調階段使用200,000個高質量演示，涵蓋多種任務場景。強化學習階段應用GRPO，對目標進行了一些小修改，以解決SFT模型在指令遵循和編碼任務上的弱點。在整個訓練過程中，定期合併SFT和GRPO階段的中間快照，提高了泛化能力並減少災難性遺忘。