Apriel-Nemotron-15b-Thinker開源高效推理模型 - 內存佔用減半，免費使用！

首頁

Apriel Nemotron 15b Thinker

由ServiceNow-AI開發

ServiceNow推出的150億參數高效推理模型，內存佔用僅為同類先進模型的一半

大型語言模型

Transformers

開源協議:MIT #高效推理 #企業級任務 #低資源消耗

下載量 1,252

發布時間 : 5/6/2025

模型概述

基於Apriel-15b-base的三階段訓練模型，專為高效推理和企業任務優化設計

模型特點

高效內存使用

體積僅為同類32B模型的一半，內存效率顯著提升

推理效率優化

相比同類模型減少40%的token消耗，生產環境效率更高

企業任務優化

在MBPP、BFCL、企業RAG等任務上表現優異

學術競爭力

在AIME、AMC、MATH等學術基準上表現具有競爭力

模型能力

文本生成

複雜推理

企業任務處理

學術問題解答

使用案例

企業應用

企業RAG系統

用於企業知識檢索和生成任務

在相關基準測試中表現優異

業務流程自動化

處理企業級文檔和流程自動化任務

學術研究

數學問題求解

解決AMC、AIME等數學競賽級別問題

在MATH-500等基準上表現良好

🚀 Apriel-Nemotron-15b-Thinker

Apriel-Nemotron-15b-Thinker 是 ServiceNow 的 Apriel SLM 系列中的一個 150 億參數推理模型。與 o1-mini、QWQ-32b 和 EXAONE-Deep-32b 等同樣規模的先進模型相比，它能取得具有競爭力的性能，同時內存佔用僅為這些替代模型的一半。

🚀 快速開始

安裝依賴

pip install transformers

運行推理模型

以下是一個使用 transformers 庫的 generate 函數來演示該模型使用方法的代碼片段：

import re
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"

# 加載分詞器和模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# 準備模型輸入
prompt = "Positive real numbers $x$ and $y$ satisfy $y^3=x^2$ and $(y-x)^2=4y^2$. What is $x+y$?\nMark your solution with \\boxed"
messages = [
    {"role": "user", "content": prompt}
]

tools = []

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 進行文本生成
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# 解析響應
response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
print("output:", output)
print("response:", response)

聊天模板

<|system|>
You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE].
<|end|>
<|user|>
# user message here
<|end|>
<|assistant|>
Here are my reasoning steps:
# thoughts here
[BEGIN FINAL RESPONSE]
# assistant response here
[END FINAL RESPONSE]
<|end|>

該模型將首先生成其思考過程，然後在 [BEGIN FINAL RESPONSE] 和 [END FINAL RESPONSE] 之間生成最終響應。以下是一個演示應用聊天模板的代碼片段：

from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 準備模型輸入
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
    {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# 示例工具
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")

使用指南

使用模型的默認聊天模板，其中已經包含了系統提示。建議將所有其他指令添加到用戶消息中。
建議將溫度設置為 0.6。
在所有評估中，確保模型以 Here are my reasoning steps:\n 開頭。這已在默認聊天模板中實現。

✨ 主要特性

內存高效：與 QWQ-32b 和 EXAONE-32b 等先進模型相比，大小僅為其一半。
生產效率高：與 QWQ-32b 相比，消耗的令牌減少 40%，在生產環境中超級高效。
適用於企業任務：在 MBPP、BFCL、Enterprise RAG、MT Bench、MixEval、IFEval 和 Multi-Challenge 等任務上表現相當或更優，非常適合代理/企業任務。
學術基準表現出色：考慮到模型大小，在 AIME-24、AIME-25、AMC-23、MATH-500 和 GPQA 等學術基準上具有競爭力。

📚 詳細文檔

評估

使用 lm-eval-harness 和 evalchemy 進行評估。

企業能力指標：
學術推理基準：
令牌效率比較（越低越好）：

訓練詳情

中期訓練/持續預訓練：在此階段，模型在精心策劃的 1000 多億個令牌示例上進行訓練，這些示例來自數學推理、編碼挑戰、科學論述和邏輯謎題。目標是加強模型的基礎推理能力。此階段對於模型作為推理器的功能至關重要，並在推理基準測試中提供顯著提升。
監督微調（SFT）：接下來，使用 200,000 個高質量演示對模型進行 SFT，涵蓋數學和科學問題解決、編碼任務、通用指令遵循場景、API/函數調用用例等。
強化學習：儘管 SFT 調整後的檢查點在數學和常識等核心能力上表現出色，但在指令遵循和編碼任務上存在弱點。為解決這些差距，應用 GRPO（對目標進行了一些小修改）。結果是在 IFEval、Multi Challenge、Enterprise RAG、MBPP 和 BFCL 等基準測試中顯著改進，同時保留了 AIME 和 AMC 等競賽級數學考試的分數。GRPO 在 GPQA 和 MixEval 上也有適度提升。在整個訓練過程中，定期合併 SFT 和 GRPO 階段的中間快照，以提高泛化能力和減少災難性遺忘。

🔧 技術細節

該模型基於 Apriel-15b-base 檢查點，通過三階段訓練管道（CPT、SFT 和 GRPO）構建。

📄 許可證

本模型採用 MIT 許可證。

👏 致謝

感謝英偉達的研究人員分享他們在構建推理器方面的詳細見解和數據！這極大地加速了我們的研究，我們通過模型命名約定來認可這一點！

📖 引用

@misc{Apriel-nemotron-15b-thinker,  
    author = {Slam labs team},  
    title = {Apriel Nemotron 15b Thinker},  
    howpublished = {https://huggingface.co/ServiceNow-AI/Apriel-Nemotron-15b-Thinker},
    publisher = {SLAM - ServiceNow Language Models Lab}  
    year = {2025}
}

⚠️ 重要提示

預期用途：Apriel 系列模型旨在用於各種通用指令任務，包括代碼輔助和生成、邏輯推理和多步驟任務、問答和信息檢索、函數調用、複雜指令遵循和代理用例。它們不適合在沒有人工監督的安全關鍵應用中使用，也不適合在需要保證事實準確性的場景中使用。
侷限性：
- 事實準確性：可能產生不正確、誤導性或過時的內容。在關鍵上下文中使用輸出之前應進行驗證。
- 偏差：可能反映訓練數據中存在的社會、文化或系統性偏差。
- 倫理問題：不要使用該模型生成有害、非法或不道德的內容。
- 語言：在英語中表現最強。在代表性不足的語言中，輸出質量可能會下降。
- 關鍵應用：在沒有保障措施的情況下，不適合用於醫療、法律、金融或其他高風險應用。