DeepHermes-3-Llama-3-3B-Preview Open-source Large Language Model - Strong Inference Response, Excellent Annotation Judgment Function

Deephermes 3 Llama 3 3B Preview

Developed by NousResearch

DeepHermes 3 Preview is the latest version of Nous Research's flagship Hermes series large language models, combining reasoning and standard response modes with improved annotation, judgment, and function calling capabilities.

Large Language Model

Transformers

English#Hybrid Reasoning Mode #Function Calling Support #Long-chain Thought Reasoning

Downloads 4,285

Release Time : 2/16/2025

Model Overview

DeepHermes 3 is a hybrid reasoning model that unifies traditional response modes and long-chain thought reasoning into a single model. It focuses on aligning large language models with users, granting them powerful control and manipulation capabilities.

Model Features

Hybrid Reasoning Mode

Supports switching between traditional response mode and long-chain thought reasoning mode via system prompts

Function Calling Capability

Specifically trained to execute function calls and return structured data

Structured Output

Capable of generating structured responses according to specified JSON schemas

Multi-turn Dialogue

Improved multi-turn dialogue capabilities and long-context coherence

Model Capabilities

Text generation

Reasoning

Function calling

Structured output

Multi-turn dialogue

Role-playing

Chat

Code generation

Use Cases

Intelligent Assistant

In-depth Q&A

Uses long-chain thought reasoning mode to solve complex problems

Provides more accurate and detailed answers

Development Tools

API Call Assistant

Parses user requests and generates correct function calls

Automates API calling processes

Data Analysis

Structured Data Generation

Generates structured data according to specified schemas

Facilitates programmatic processing and analysis

🚀 DeepHermes 3 - Llama-3 3B Preview

DeepHermes 3 - Llama-3 3B Preview is a cutting - edge LLM model from Nous Research. It unifies reasoning and normal response modes, and has improvements in annotation, judgement, and function calling.

image/jpeg

✨ Features

Model Description

DeepHermes 3 Preview is the latest version of the Hermes series of LLMs by Nous Research. It's one of the first models globally to integrate Reasoning (long chains of thought for better answer accuracy) and normal LLM response modes. It also enhances LLM annotation, judgement, and function calling.

It's a hybrid reasoning model, combining both "intuitive", traditional mode responses and long chain of thought reasoning responses, switchable via a system prompt.

Hermes 3, the predecessor of DeepHermes 3, is a generalist language model with numerous improvements over Hermes 2, such as advanced agentic capabilities, better role - playing, reasoning, multi - turn conversation, long - context coherence, and overall enhancements.

The Hermes series of models aims to align LLMs with users, providing powerful steering capabilities and control to end - users.

This is a preview Hermes with early reasoning capabilities, distilled from R1 across various tasks that benefit from reasoning and objectivity. Some quirks may be found! Please share any interesting findings or issues you discover!

Benchmarks

Reasoning Benchmarks: With Reasoning ON and OFF.
Comparison to Llama - 3.2 - 3B - Instruct:

Prompt Format

DeepHermes 3 uses the Llama - Chat format as the prompt format, enabling a more unified and structured system for multi - turn chat dialogues with the LLM. System prompts offer steerability and new ways to interact with the LLM, guiding the model's rules, roles, and stylistic choices.

Deep Thinking Mode

Deep Hermes Preview can activate long chain of thought with a system prompt:

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
    },
    {
        "role": "user",
        "content": "What is y if y=2*2-4+(3*2)"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

Note: For difficult problems, DeepHermes can think using up to 13,000 tokens. You may need to increase max_new_tokens to be much larger than 2500 for difficult problems.

Advanced Usage - Standard "Intuitive" Response Mode

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are Hermes, an AI assistant"
    },
    {
        "role": "user",
        "content": "What are the most interesting things to do in Paris?"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

VLLM Inference

After pip install vllm, you can run the model with vLLM in your terminal: vllm serve NousResearch/DeepHermes-3-Llama-3-3B-Preview You can then use the model over API using the OpenAI library, just like calling OpenAI's API.

Prompt Format for Function Calling

The model was trained on specific system prompts and structures for Function Calling. You should use the system role with a specific message, followed by a function signature json.

<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API.\\n\\n    Args:\\n        symbol (str): The stock symbol.\\n\\n    Returns:\\n        dict: A dictionary containing fundamental data.\\n            Keys:\\n                - \'symbol\': The stock symbol.\\n                - \'company_name\': The long name of the company.\\n                - \'sector\': The sector to which the company belongs.\\n                - \'industry\': The industry to which the company belongs.\\n                - \'market_cap\': The market capitalization of the company.\\n                - \'pe_ratio\': The forward price-to-earnings ratio.\\n                - \'pb_ratio\': The price-to-book ratio.\\n                - \'dividend_yield\': The dividend yield.\\n                - \'eps\': The trailing earnings per share.\\n                - \'beta\': The beta value of the stock.\\n                - \'52_week_high\': The 52-week high price of the stock.\\n                - \'52_week_low\': The 52-week low price of the stock.", "parameters": {"type": "object", "properties": {"symbol": {"type": "string"}}, "required": ["symbol"]}}}  </tools> Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|eot_id|><|start_header_id|>user<|end_header_id|>

To complete the function call, create a user prompt after the system prompt. After the model generates a tool call, parse it, call the API, get the returned values, and pass them back as a new role, tool.

Prompt Format for JSON Mode / Structured Outputs

The model was trained on a specific system prompt for Structured Outputs, which should respond with only a json object response in a specific json schema.

<|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|eot_id|>

Given the {schema}, just give a typical user prompt, and it will respond in JSON.

Inference Code for Function Calling

All code for utilizing, parsing, and building function calling templates is available on our GitHub: https://github.com/NousResearch/Hermes-Function-Calling

image/png

Quantized Versions

GGUF Quants: https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-3B-Preview-GGUF

How to cite

@misc{
      title={DeepHermes 3 Preview}, 
      author={Teknium and Roger Jin and Chen Guang and Jai Suphavadeeprasit and Jeffrey Quesnelle},
      year={2025}
}

📄 License

The license of this model is llama3.

📋 Information Table

Property	Details
Base Model	meta-llama/Meta-Llama-3.2-3B
Library Name	transformers
Tags	Llama - 3, instruct, finetune, chatml, gpt4, synthetic data, distillation, function calling, json mode, axolotl, roleplaying, chat, reasoning, r1, vllm
Model Name	DeepHermes-3-Llama-3.1-3B

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご