Apriel-Nemotron-15b-Thinker开源高效推理模型 - 内存占用减半，免费使用！

首页

Apriel Nemotron 15b Thinker

由 ServiceNow-AI 开发

ServiceNow推出的150亿参数高效推理模型，内存占用仅为同类先进模型的一半

大型语言模型

Transformers

开源协议:MIT #高效推理 #企业级任务 #低资源消耗

下载量 1,252

发布时间 : 5/6/2025

模型简介

基于Apriel-15b-base的三阶段训练模型，专为高效推理和企业任务优化设计

模型特点

高效内存使用

体积仅为同类32B模型的一半，内存效率显著提升

推理效率优化

相比同类模型减少40%的token消耗，生产环境效率更高

企业任务优化

在MBPP、BFCL、企业RAG等任务上表现优异

学术竞争力

在AIME、AMC、MATH等学术基准上表现具有竞争力

模型能力

文本生成

复杂推理

企业任务处理

学术问题解答

使用案例

企业应用

企业RAG系统

用于企业知识检索和生成任务

在相关基准测试中表现优异

业务流程自动化

处理企业级文档和流程自动化任务

学术研究

数学问题求解

解决AMC、AIME等数学竞赛级别问题

在MATH-500等基准上表现良好

🚀 Apriel-Nemotron-15b-Thinker

Apriel-Nemotron-15b-Thinker 是 ServiceNow 的 Apriel SLM 系列中的一个 150 亿参数推理模型。与 o1-mini、QWQ-32b 和 EXAONE-Deep-32b 等同样规模的先进模型相比，它能取得具有竞争力的性能，同时内存占用仅为这些替代模型的一半。

🚀 快速开始

安装依赖

pip install transformers

运行推理模型

以下是一个使用 transformers 库的 generate 函数来演示该模型使用方法的代码片段：

import re
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"

# 加载分词器和模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# 准备模型输入
prompt = "Positive real numbers $x$ and $y$ satisfy $y^3=x^2$ and $(y-x)^2=4y^2$. What is $x+y$?\nMark your solution with \\boxed"
messages = [
    {"role": "user", "content": prompt}
]

tools = []

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 进行文本生成
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# 解析响应
response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
print("output:", output)
print("response:", response)

聊天模板

<|system|>
You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE].
<|end|>
<|user|>
# user message here
<|end|>
<|assistant|>
Here are my reasoning steps:
# thoughts here
[BEGIN FINAL RESPONSE]
# assistant response here
[END FINAL RESPONSE]
<|end|>

该模型将首先生成其思考过程，然后在 [BEGIN FINAL RESPONSE] 和 [END FINAL RESPONSE] 之间生成最终响应。以下是一个演示应用聊天模板的代码片段：

from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 准备模型输入
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
    {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# 示例工具
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")

使用指南

使用模型的默认聊天模板，其中已经包含了系统提示。建议将所有其他指令添加到用户消息中。
建议将温度设置为 0.6。
在所有评估中，确保模型以 Here are my reasoning steps:\n 开头。这已在默认聊天模板中实现。

✨ 主要特性

内存高效：与 QWQ-32b 和 EXAONE-32b 等先进模型相比，大小仅为其一半。
生产效率高：与 QWQ-32b 相比，消耗的令牌减少 40%，在生产环境中超级高效。
适用于企业任务：在 MBPP、BFCL、Enterprise RAG、MT Bench、MixEval、IFEval 和 Multi-Challenge 等任务上表现相当或更优，非常适合代理/企业任务。
学术基准表现出色：考虑到模型大小，在 AIME-24、AIME-25、AMC-23、MATH-500 和 GPQA 等学术基准上具有竞争力。

📚 详细文档

评估

使用 lm-eval-harness 和 evalchemy 进行评估。

企业能力指标：
学术推理基准：
令牌效率比较（越低越好）：

训练详情

中期训练/持续预训练：在此阶段，模型在精心策划的 1000 多亿个令牌示例上进行训练，这些示例来自数学推理、编码挑战、科学论述和逻辑谜题。目标是加强模型的基础推理能力。此阶段对于模型作为推理器的功能至关重要，并在推理基准测试中提供显著提升。
监督微调（SFT）：接下来，使用 200,000 个高质量演示对模型进行 SFT，涵盖数学和科学问题解决、编码任务、通用指令遵循场景、API/函数调用用例等。
强化学习：尽管 SFT 调整后的检查点在数学和常识等核心能力上表现出色，但在指令遵循和编码任务上存在弱点。为解决这些差距，应用 GRPO（对目标进行了一些小修改）。结果是在 IFEval、Multi Challenge、Enterprise RAG、MBPP 和 BFCL 等基准测试中显著改进，同时保留了 AIME 和 AMC 等竞赛级数学考试的分数。GRPO 在 GPQA 和 MixEval 上也有适度提升。在整个训练过程中，定期合并 SFT 和 GRPO 阶段的中间快照，以提高泛化能力和减少灾难性遗忘。

🔧 技术细节

该模型基于 Apriel-15b-base 检查点，通过三阶段训练管道（CPT、SFT 和 GRPO）构建。

📄 许可证

本模型采用 MIT 许可证。

👏 致谢

感谢英伟达的研究人员分享他们在构建推理器方面的详细见解和数据！这极大地加速了我们的研究，我们通过模型命名约定来认可这一点！

📖 引用

@misc{Apriel-nemotron-15b-thinker,  
    author = {Slam labs team},  
    title = {Apriel Nemotron 15b Thinker},  
    howpublished = {https://huggingface.co/ServiceNow-AI/Apriel-Nemotron-15b-Thinker},
    publisher = {SLAM - ServiceNow Language Models Lab}  
    year = {2025}
}

⚠️ 重要提示

预期用途：Apriel 系列模型旨在用于各种通用指令任务，包括代码辅助和生成、逻辑推理和多步骤任务、问答和信息检索、函数调用、复杂指令遵循和代理用例。它们不适合在没有人工监督的安全关键应用中使用，也不适合在需要保证事实准确性的场景中使用。
局限性：
- 事实准确性：可能产生不正确、误导性或过时的内容。在关键上下文中使用输出之前应进行验证。
- 偏差：可能反映训练数据中存在的社会、文化或系统性偏差。
- 伦理问题：不要使用该模型生成有害、非法或不道德的内容。
- 语言：在英语中表现最强。在代表性不足的语言中，输出质量可能会下降。
- 关键应用：在没有保障措施的情况下，不适合用于医疗、法律、金融或其他高风险应用。