DeepHermes-3-Llama-3-3B-Preview开源大语言模型 - 推理响应强，注释判断功能优

首页

Deephermes 3 Llama 3 3B Preview

由 NousResearch 开发

DeepHermes 3 预览版是 Nous Research 旗舰 Hermes 系列大语言模型的最新版本，结合了推理和普通响应模式，改进了注释、判断和函数调用功能。

大型语言模型

Transformers

英语#混合推理模式 #函数调用支持 #长链思维推理

下载量 4,285

发布时间 : 2/16/2025

模型简介

DeepHermes 3 是一个混合推理模型，将传统响应模式和长链思维推理模式统一到一个模型中。它专注于将大语言模型与用户对齐，赋予用户强大的操控能力和控制权。

模型特点

混合推理模式

支持通过系统提示切换传统响应模式和长链思维推理模式

函数调用能力

专门训练用于执行函数调用和返回结构化数据

结构化输出

能够按照指定的 JSON 模式生成结构化响应

多轮对话

改进的多轮对话能力和长上下文连贯性

模型能力

文本生成

推理

函数调用

结构化输出

多轮对话

角色扮演

聊天

代码生成

使用案例

智能助手

深度问题解答

使用长链思维推理模式解决复杂问题

提供更准确和详细的解答

开发工具

API 调用助手

解析用户请求并生成正确的函数调用

自动化 API 调用流程

数据分析

结构化数据生成

按照指定模式生成结构化数据

便于程序化处理和分析

🚀 DeepHermes 3 - Llama-3 3B预览版

DeepHermes 3预览版是Nous Research旗舰Hermes系列大语言模型（LLM）的最新版本，也是全球首批将推理能力（通过长链思维提升答案准确性）与常规LLM响应模式整合于单一模型的产品之一。同时，该模型在LLM注释、判断和函数调用方面也有显著改进。

模型信息

属性	详情
模型类型	DeepHermes 3 - Llama-3 3B预览版
基础模型	meta-llama/Meta-Llama-3.2-3B
库名称	transformers

示例展示

示例图片

🚀 快速开始

环境准备

确保你已经安装了必要的库，如torch、transformers、flash_attn等。

推理代码示例

以下是使用HuggingFace Transformers进行推理的示例代码：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are Hermes, an AI assistant"
    },
    {
        "role": "user",
        "content": "What are the most interesting things to do in Paris?"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

VLLM推理

在安装vllm后，你可以在终端运行以下命令来使用vLLM进行推理：

vllm serve NousResearch/DeepHermes-3-Llama-3-3B-Preview

之后，你可以使用OpenAI库通过API调用该模型，就像调用OpenAI的API一样。

✨ 主要特性

统一推理与常规响应模式

DeepHermes 3预览版是一种混合推理模型，也是首批将“直觉”传统模式响应与长链思维推理响应统一到单一模型中的LLM模型之一，可通过系统提示进行切换。

改进的功能

LLM注释、判断和函数调用：在这些方面进行了显著改进，提升了模型的性能和实用性。
高级代理能力：具有更强大的代理能力，能够更好地完成各种任务。
角色扮演和推理：在角色扮演和推理方面表现出色，能够进行更自然、准确的对话。
多轮对话和长上下文连贯性：支持多轮对话，并在长上下文环境下保持良好的连贯性。

以用户为中心的设计理念

Hermes系列模型的设计理念是将LLM与用户需求紧密结合，赋予终端用户强大的控制和引导能力。

💻 使用示例

基础用法

标准“直觉”响应模式

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are Hermes, an AI assistant"
    },
    {
        "role": "user",
        "content": "What are the most interesting things to do in Paris?"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

高级用法

深度思考模式

通过系统提示激活长链思维推理模式：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
    },
    {
        "role": "user",
        "content": "What is y if y=2*2-4+(3*2)"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

注意：对于复杂问题，DeepHermes可能会使用多达13,000个标记进行思考。因此，对于困难问题，你可能需要将max_new_tokens设置得远大于2500。

函数调用

系统提示格式

<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API.\\n\\n    Args:\\n        symbol (str): The stock symbol.\\n\\n    Returns:\\n        dict: A dictionary containing fundamental data.\\n            Keys:\\n                - \'symbol\': The stock symbol.\\n                - \'company_name\': The long name of the company.\\n                - \'sector\': The sector to which the company belongs.\\n                - \'industry\': The industry to which the company belongs.\\n                - \'market_cap\': The market capitalization of the company.\\n                - \'pe_ratio\': The forward price-to-earnings ratio.\\n                - \'pb_ratio\': The price-to-book ratio.\\n                - \'dividend_yield\': The dividend yield.\\n                - \'eps\': The trailing earnings per share.\\n                - \'beta\': The beta value of the stock.\\n                - \'52_week_high\': The 52-week high price of the stock.\\n                - \'52_week_low\': The 52-week low price of the stock.", "parameters": {"type": "object", "properties": {"symbol": {"type": "string"}}, "required": ["symbol"]}}}  </tools> Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|eot_id|><|start_header_id|>user<|end_header_id|>

用户提示示例

Fetch the stock fundamentals data for Tesla (TSLA)<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模型生成的工具调用示例

<tool_call>
{"arguments": {"symbol": "TSLA"}, "name": "get_stock_fundamentals"}
</tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|>

工具响应示例

<tool_response>
{"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
</tool_response>
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

JSON模式/结构化输出

系统提示格式

<|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|eot_id|>

给定你提供的{schema}，模型将按照该JSON格式生成响应，你只需提供一个常规的用户提示，模型就会以JSON格式进行响应。

📚 详细文档

推理代码

所有用于利用、解析和构建函数调用模板的代码都可以在我们的GitHub仓库中找到： https://github.com/NousResearch/Hermes-Function-Calling

量化版本

GGUF量化版本：https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-3B-Preview-GGUF

引用方式

@misc{
      title={DeepHermes 3 Preview}, 
      author={Teknium and Roger Jin and Chen Guang and Jai Suphavadeeprasit and Jeffrey Quesnelle},
      year={2025}
}