Apriel-Nemotron-15b-Thinker开源推理模型 - 高效内存使用，适用于多场景

首页

Apriel Nemotron 15b Thinker GGUF

由 Mungert 开发

Apriel-Nemotron-15b-Thinker是一款强大的推理模型，在同规模模型中表现出色，具有高效的内存使用和优秀的推理能力，适用于多种企业和学术场景。

大型语言模型

Transformers

开源协议:MIT #高效推理 #企业级任务 #数学竞赛级

下载量 1,097

发布时间 : 6/12/2025

模型简介

Apriel-Nemotron-15b-Thinker是一款高效的推理模型，适用于企业和学术场景，具有出色的推理能力和内存效率。

模型特点

内存高效

模型大小仅为同类SOTA模型的一半，内存使用效率高。

令牌高效

与同类模型相比，消耗的令牌减少40%，在生产环境中效率极高。

任务表现出色

在MBPP、BFCL、Enterprise RAG、MT Bench等任务上表现相当或更优。

学术基准竞争力强

在AIME-24、AIME-25、AMC-23等学术基准上具有竞争力。

模型能力

文本生成

逻辑推理

问答

代码生成

函数调用

复杂指令遵循

使用案例

企业应用

代码协助和生成

帮助开发人员生成和优化代码。

提高开发效率，减少编码错误。

逻辑推理和多步骤任务

解决复杂的逻辑推理问题。

提供准确的推理结果。

学术研究

数学和科学问题解决

解决竞赛级数学和科学问题。

在AIME和AMC等考试中表现优异。

🚀 Apriel-Nemotron-15b-Thinker GGUF模型

Apriel-Nemotron-15b-Thinker GGUF模型是一款强大的推理模型，在同规模模型中表现出色，具有高效的内存使用和优秀的推理能力，适用于多种企业和学术场景。

🚀 快速开始

安装依赖

pip install transformers

运行推理模型

以下是使用transformers库的generate函数调用该模型的代码示例：

import re
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"

# 加载分词器和模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# 准备模型输入
prompt = "Positive real numbers $x$ and $y$ satisfy $y^3=x^2$ and $(y-x)^2=4y^2$. What is $x+y$?\nMark your solution with \\boxed"
messages = [
    {"role": "user", "content": prompt}
]

tools = []

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 进行文本生成
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# 解析输出
response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
print("output:", output)
print("response:", response)

聊天模板

<|system|>
You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE].
<|end|>
<|user|>
# user message here
<|end|>
<|assistant|>
Here are my reasoning steps:
# thoughts here
[BEGIN FINAL RESPONSE]
# assistant response here
[END FINAL RESPONSE]
<|end|>

以下是应用聊天模板的代码示例：

from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 准备模型输入
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
    {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# 示例工具
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")

使用指南

使用模型的默认聊天模板，其中已包含系统提示。建议将所有其他指令添加到用户消息中。
建议将温度设置为0.6。
在所有评估中，确保模型以Here are my reasoning steps:\n开头。这已在默认聊天模板中实现。

✨ 主要特性

内存高效：模型大小仅为QWQ - 32b和EXAONE - 32b等SOTA模型的一半，内存使用效率高。
令牌高效：与QWQ - 32b相比，消耗的令牌减少40%，在生产环境中效率极高。
任务表现出色：在MBPP、BFCL、Enterprise RAG、MT Bench、MixEval、IFEval和Multi - Challenge等任务上表现相当或更优，适合代理/企业任务。
学术基准竞争力强：考虑到模型大小，在AIME - 24、AIME - 25、AMC - 23、MATH - 500和GPQA等学术基准上具有竞争力。

📦 安装指南

pip install transformers

📚 详细文档

模型生成详情

该模型使用llama.cpp在提交版本1f63e75f时生成。

超越IMatrix的量化

我一直在试验一种新的量化方法，该方法有选择地提高关键层的精度，超出了默认IMatrix配置的范围。在测试中，标准IMatrix量化在低比特深度下表现不佳，尤其是在专家混合（MoE）模型中。为了解决这个问题，我使用llama.cpp中的--tensor-type选项手动将重要层的精度提升。你可以在以下链接查看实现：使用llama.cpp进行层提升。虽然这会增加模型文件的大小，但显著提高了给定量化级别的精度。

评估

使用[lm - eval - harness](https://github.com/EleutherAI/lm - evaluation - harness)和evalchemy进行评估。

体现企业能力的基准测试：
学术推理基准测试：
令牌效率比较（越低越好）：

训练详情

中期训练/持续预训练：在此阶段，模型在精心挑选的超过1000亿个令牌的示例上进行训练，这些示例来自数学推理、编码挑战、科学论述和逻辑谜题。目标是增强模型的基础推理能力。这一阶段对模型作为推理器的功能至关重要，并在推理基准测试中带来显著提升。
监督微调（SFT）：接下来，使用200,000个高质量的演示对模型进行SFT，这些演示涵盖数学和科学问题解决、编码任务、通用指令遵循场景、API/函数调用用例等。
强化学习：尽管SFT调整后的检查点在数学和常识等核心能力上表现出色，但在指令遵循和编码任务上存在弱点。为了解决这些问题，应用GRPO（对目标进行了一些小修改）。结果是在IFEval、Multi Challenge、Enterprise RAG、MBPP和BFCL等基准测试中显著改进，同时保留了AIME和AMC等竞赛级数学考试的分数。GRPO在GPQA和MixEval上也有适度提升。在整个训练过程中，定期合并SFT和GRPO阶段的中间快照，提高了泛化能力并减少灾难性遗忘。

预期用途

Apriel系列模型设计用于各种通用指令任务，包括：

代码协助和生成
逻辑推理和多步骤任务
问答和信息检索
函数调用、复杂指令遵循和代理用例

它们不适合在没有人工监督的安全关键应用程序中使用，也不适合需要保证事实准确性的场景。

局限性

事实准确性：可能产生不正确、误导性或过时的内容。在关键上下文中使用输出之前，应进行验证。
偏差：可能反映训练数据中存在的社会、文化或系统性偏差。
伦理问题：不要使用模型生成有害、非法或不道德的内容。
语言：在英语中表现最强。在代表性不足的语言中，输出质量可能会下降。
关键用途：在没有保障措施的情况下，不适合用于医疗、法律、金融或其他高风险应用。

安全和负责任使用

安全责任

部署者和用户强烈建议使其安全实践与既定框架和监管指南（如欧盟AI法案和NIST AI风险管理框架（RMF））保持一致。

部署者指南

定期进行鲁棒性评估，以识别和减轻对抗性输入。
实施验证和过滤流程，以防止有害或有偏差的输出。
持续进行数据隐私检查，以防止意外数据泄露。
向所有最终用户记录并传达模型的局限性、预期用途和已知安全风险。
定期安排安全审查和更新，以应对新兴威胁和漏洞。

用户指南

遵循部署者提供的既定安全政策和使用指南。
在与模型交互时保护和管理敏感信息。
向部署者或开发者报告异常、可疑行为或不安全的输出。
在交互过程中保持人工监督并运用判断力，以减轻潜在的安全或伦理风险。

免责声明

用户承担安全部署、管理和使用此开源大语言模型的责任。模型按“原样”提供，不提供关于安全或适用于任何特定应用程序或环境的明确或暗示保证。

软件

训练栈：[Fast - LLM](https://github.com/ServiceNow/Fast - LLM)

许可证

MIT

致谢

感谢英伟达的研究人员分享他们在构建推理器方面的详细见解和数据！这极大地加速了我们的研究，我们通过模型命名惯例来认可这一点！

引用

@misc{Apriel-nemotron-15b-thinker,  
    author = {Slam labs team},  
    title = {Apriel Nemotron 15b Thinker},  
    howpublished = {https://huggingface.co/ServiceNow-AI/Apriel-Nemotron-15b-Thinker},
    publisher = {SLAM - ServiceNow Language Models Lab}  
    year = {2025}
}

量子网络监控测试

如果你发现这些模型有用，请帮助我测试我的AI驱动的量子网络监控助手，进行量子就绪安全检查：量子网络监控。量子网络监控服务的完整开源代码可在我的github仓库（名称中包含NetworkMonitor的仓库）中找到：量子网络监控源代码。你还可以找到我用于量化模型的代码，如果你想自己进行量化：GGUFModelBuilder。

测试方法

选择一种AI助手类型：

TurboLLM (GPT - 4.1 - mini)
HugLLM (Hugginface开源模型)
TestLLM (仅实验性CPU版本)

测试内容

我正在挑战小开源模型在AI网络监控方面的极限，具体包括：

针对实时网络服务进行函数调用
探索模型在处理以下任务时可以达到的最小规模：
- 自动Nmap安全扫描
- 量子就绪检查
- 网络监控任务

TestLLM - 当前实验模型（在huggingface docker空间的2个CPU线程上运行llama.cpp）

零配置设置
加载时间约30秒（推理速度慢，但无API成本）。由于成本低，无令牌限制。
寻求帮助：如果你对边缘设备AI感兴趣，让我们合作！

其他助手

TurboLLM - 使用gpt - 4.1 - mini：
- 表现非常好，但不幸的是OpenAI按令牌收费。因此，令牌使用受限。
- 创建自定义cmd处理器，在量子网络监控代理上运行.net代码。
- 实时网络诊断和监控
- 安全审计
- 渗透测试（Nmap/Metasploit）
HugLLM - 最新的开源模型：在Hugging Face推理API上运行。使用Novita托管的最新模型表现相当不错。

示例测试命令

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)" 注意，你需要安装量子网络监控代理才能运行.net代码。这是一个非常灵活和强大的功能，请谨慎使用！

最后说明

我自掏腰包为创建这些模型文件的服务器、运行量子网络监控服务以及支付Novita和OpenAI的推理费用提供资金。模型创建和量子网络监控项目背后的所有代码都是开源的。你可以自由使用任何你认为有用的代码。如果你欣赏这项工作，请考虑请我喝咖啡。你的支持有助于支付服务成本，并允许我为大家提高令牌限制。我也欢迎工作机会或赞助。感谢！

🔧 技术细节

在模型训练过程中，采用了三阶段训练管道（CPT、SFT和GRPO）。在中期训练/持续预训练阶段，使用超过1000亿个精心挑选的令牌进行训练，以增强基础推理能力。监督微调阶段使用200,000个高质量演示，涵盖多种任务场景。强化学习阶段应用GRPO，对目标进行了一些小修改，以解决SFT模型在指令遵循和编码任务上的弱点。在整个训练过程中，定期合并SFT和GRPO阶段的中间快照，提高了泛化能力并减少灾难性遗忘。