Apriel-Nemotron-15b-Thinker Open-source and Efficient Inference Model

Apriel Nemotron 15b Thinker

Developed by ServiceNow-AI

A 15-billion-parameter efficient inference model launched by ServiceNow, with memory usage only half that of comparable advanced models

Large Language Model

Transformers

Open Source License:MIT #Efficient Inference #Enterprise-Level Tasks #Low Resource Consumption

Downloads 1,252

Release Time : 5/6/2025

Model Overview

A three-stage training model based on Apriel-15b-base, specifically optimized for efficient inference and enterprise tasks

Model Features

Efficient Memory Usage

Size is only half of comparable 32B models, with significantly improved memory efficiency

Inference Efficiency Optimization

Reduces token consumption by 40% compared to similar models, offering higher efficiency in production environments

Enterprise Task Optimization

Excellent performance on tasks such as MBPP, BFCL, and enterprise RAG

Academic Competitiveness

Competitive performance on academic benchmarks like AIME, AMC, and MATH

Model Capabilities

Text Generation

Complex Reasoning

Enterprise Task Processing

Academic Problem Solving

Use Cases

Enterprise Applications

Enterprise RAG System

Used for enterprise knowledge retrieval and generation tasks

Excellent performance in relevant benchmark tests

Business Process Automation

Handles enterprise-level document and process automation tasks

Academic Research

Mathematical Problem Solving

Solves competition-level problems like AMC and AIME

Good performance on benchmarks such as MATH-500

🚀 Apriel-Nemotron-15b-Thinker

Apriel-Nemotron-15b-Thinker is a 15-billion-parameter reasoning model in ServiceNow's Apriel SLM series. It offers competitive performance against similarly sized state-of-the-art models while being more memory-efficient. This model is suitable for various enterprise and academic tasks.

🚀 Quick Start

To start using the Apriel-Nemotron-15b-Thinker model, you first need to install the transformers library:

pip install transformers

✨ Features

Memory Efficient: It has only half the size of SOTA models like QWQ-32b and EXAONE-32b, making it highly memory-efficient.
Token-Efficient: Consumes 40% less tokens compared to QWQ-32b, which is super efficient in production.
Great for Agentic/Enterprise Tasks: Performs on par or outperforms on tasks such as MBPP, BFCL, Enterprise RAG, MT Bench, MixEval, IFEval, and Multi-Challenge.
Competitive on Academic Benchmarks: Considering its model size, it shows competitive performance on academic benchmarks like AIME-24, AIME-25, AMC-23, MATH-500, and GPQA.

📦 Installation

pip install transformers

💻 Usage Examples

Basic Usage

Here is a code snippet demonstrating the model's usage with the transformers library's generate function:

import re
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Positive real numbers $x$ and $y$ satisfy $y^3=x^2$ and $(y-x)^2=4y^2$. What is $x+y$?\nMark your solution with \\boxed"
messages = [
    {"role": "user", "content": prompt}
]

tools = []

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# parsing the response
response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)\[END FINAL RESPONSE\]", output, re.DOTALL)[0].strip()
print("output:", output)
print("response:", response)

Advanced Usage

The following code demonstrates the application of the chat template:

from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-Nemotron-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# prepare the model input
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
    {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# example tools
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")

Usage Guidelines

⚠️ Important Note

Use the model's default chat template, which already includes a system prompt. We recommend adding all other instructions within the user message.

We recommend setting temperature to 0.6.

We ensure the model starts with Here are my reasoning steps:\n during all our evaluations. This is implemented in the default chat template.

📚 Documentation

Summary

Apriel-Nemotron-15b-Thinker is a 15-billion-parameter reasoning model in ServiceNow's Apriel SLM series. It achieves competitive performance against similarly sized state-of-the-art models like o1-mini, QWQ-32b, and EXAONE-Deep-32b, while maintaining only half the memory footprint of those alternatives. It builds upon the Apriel-15b-base checkpoint through a three-stage training pipeline (CPT, SFT, and GRPO).

Evaluation

Evaluations were conducted using lm-eval-harness and evalchemy.

Benchmarks that are indicative of enterprise capability: Image
Academic reasoning benchmarks: Image
Token efficiency comparison (lower the better): Image

Training Details

Mid training / Continual Pre-training: The model is trained on 100+ billion tokens of carefully curated examples from mathematical reasoning, coding challenges, scientific discourse, and logical puzzles to strengthen its foundational reasoning capabilities.
Supervised Fine-Tuning (SFT): The model is SFT using 200,000 high-quality demonstrations covering various tasks such as mathematical and scientific problem-solving, coding tasks, etc.
Reinforcement Learning: GRPO (with some minor modifications to the objective) is applied to address the weaknesses in instruction following and coding tasks. Intermediate snapshots from both the SFT and GRPO stages are periodically merged to improve generalization and prevent catastrophic forgetting.

Intended Use

The Apriel family of models are designed for a variety of general-purpose instruction tasks, including code assistance and generation, logical reasoning and multi-step tasks, question answering and information retrieval, function calling, complex instruction following, and agent use cases. They are not intended for use in safety-critical applications without human oversight or in scenarios requiring guaranteed factual accuracy.

Limitations

Factual accuracy: May produce incorrect, misleading, or outdated content. Outputs should be verified before use in critical contexts.
Bias: May reflect societal, cultural, or systemic biases present in training data.
Ethics: Do not use the model to produce harmful, unlawful, or unethical content.
Language: Strongest performance is in English. Output quality may degrade in underrepresented languages.
Critical use: Not suitable for medical, legal, financial, or other high-risk applications without safeguards.

Security and Responsible Use

Security Responsibilities: Deployers and users are strongly encouraged to align their security practices with established frameworks and regulatory guidelines such as the EU AI Act and the NIST AI Risk Management Framework (RMF).

Guidelines for Deployers:

Regularly conduct robustness assessments to identify and mitigate adversarial inputs.
Implement validation and filtering processes to prevent harmful or biased outputs.
Continuously perform data privacy checks to guard against unintended data leaks.
Document and communicate the model's limitations, intended usage, and known security risks to all end-users.
Schedule periodic security reviews and updates to address emerging threats and vulnerabilities.

Guidelines for Users:

Follow established security policies and usage guidelines provided by deployers.
Protect and manage sensitive information when interacting with the model.
Report anomalies, suspicious behavior, or unsafe outputs to deployers or developers.
Maintain human oversight and apply judgment to mitigate potential security or ethical risks during interactions.

Software

The model is based on the transformers library.

License

The model is released under the MIT license.

Acknowledgements

Not provided in the original README.

Citation

Not provided in the original README.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご