DeepHermes-3-Llama-3-3B-Preview開源大語言模型 - 推理響應強，註釋判斷功能優

首頁

Deephermes 3 Llama 3 3B Preview

由NousResearch開發

DeepHermes 3 預覽版是 Nous Research 旗艦 Hermes 系列大語言模型的最新版本，結合了推理和普通響應模式，改進了註釋、判斷和函數調用功能。

大型語言模型

Transformers

英語#混合推理模式 #函數調用支持 #長鏈思維推理

下載量 4,285

發布時間 : 2/16/2025

模型概述

DeepHermes 3 是一個混合推理模型，將傳統響應模式和長鏈思維推理模式統一到一個模型中。它專注於將大語言模型與用戶對齊，賦予用戶強大的操控能力和控制權。

模型特點

混合推理模式

支持通過系統提示切換傳統響應模式和長鏈思維推理模式

函數調用能力

專門訓練用於執行函數調用和返回結構化數據

結構化輸出

能夠按照指定的 JSON 模式生成結構化響應

多輪對話

改進的多輪對話能力和長上下文連貫性

模型能力

文本生成

推理

函數調用

結構化輸出

多輪對話

角色扮演

聊天

代碼生成

使用案例

智能助手

深度問題解答

使用長鏈思維推理模式解決複雜問題

提供更準確和詳細的解答

開發工具

API 調用助手

解析用戶請求並生成正確的函數調用

自動化 API 調用流程

數據分析

結構化數據生成

按照指定模式生成結構化數據

便於程序化處理和分析

🚀 DeepHermes 3 - Llama-3 3B預覽版

DeepHermes 3預覽版是Nous Research旗艦Hermes系列大語言模型（LLM）的最新版本，也是全球首批將推理能力（通過長鏈思維提升答案准確性）與常規LLM響應模式整合於單一模型的產品之一。同時，該模型在LLM註釋、判斷和函數調用方面也有顯著改進。

模型信息

屬性	詳情
模型類型	DeepHermes 3 - Llama-3 3B預覽版
基礎模型	meta-llama/Meta-Llama-3.2-3B
庫名稱	transformers

示例展示

示例圖片

🚀 快速開始

環境準備

確保你已經安裝了必要的庫，如torch、transformers、flash_attn等。

推理代碼示例

以下是使用HuggingFace Transformers進行推理的示例代碼：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are Hermes, an AI assistant"
    },
    {
        "role": "user",
        "content": "What are the most interesting things to do in Paris?"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

VLLM推理

在安裝vllm後，你可以在終端運行以下命令來使用vLLM進行推理：

vllm serve NousResearch/DeepHermes-3-Llama-3-3B-Preview

之後，你可以使用OpenAI庫通過API調用該模型，就像調用OpenAI的API一樣。

✨ 主要特性

統一推理與常規響應模式

DeepHermes 3預覽版是一種混合推理模型，也是首批將“直覺”傳統模式響應與長鏈思維推理響應統一到單一模型中的LLM模型之一，可通過系統提示進行切換。

改進的功能

LLM註釋、判斷和函數調用：在這些方面進行了顯著改進，提升了模型的性能和實用性。
高級代理能力：具有更強大的代理能力，能夠更好地完成各種任務。
角色扮演和推理：在角色扮演和推理方面表現出色，能夠進行更自然、準確的對話。
多輪對話和長上下文連貫性：支持多輪對話，並在長上下文環境下保持良好的連貫性。

以用戶為中心的設計理念

Hermes系列模型的設計理念是將LLM與用戶需求緊密結合，賦予終端用戶強大的控制和引導能力。

💻 使用示例

基礎用法

標準“直覺”響應模式

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are Hermes, an AI assistant"
    },
    {
        "role": "user",
        "content": "What are the most interesting things to do in Paris?"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

高級用法

深度思考模式

通過系統提示激活長鏈思維推理模式：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")

model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/DeepHermes-3-Llama-3-3B-Preview",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
    },
    {
        "role": "user",
        "content": "What is y if y=2*2-4+(3*2)"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

注意：對於複雜問題，DeepHermes可能會使用多達13,000個標記進行思考。因此，對於困難問題，你可能需要將max_new_tokens設置得遠大於2500。

函數調用

系統提示格式

<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API.\\n\\n    Args:\\n        symbol (str): The stock symbol.\\n\\n    Returns:\\n        dict: A dictionary containing fundamental data.\\n            Keys:\\n                - \'symbol\': The stock symbol.\\n                - \'company_name\': The long name of the company.\\n                - \'sector\': The sector to which the company belongs.\\n                - \'industry\': The industry to which the company belongs.\\n                - \'market_cap\': The market capitalization of the company.\\n                - \'pe_ratio\': The forward price-to-earnings ratio.\\n                - \'pb_ratio\': The price-to-book ratio.\\n                - \'dividend_yield\': The dividend yield.\\n                - \'eps\': The trailing earnings per share.\\n                - \'beta\': The beta value of the stock.\\n                - \'52_week_high\': The 52-week high price of the stock.\\n                - \'52_week_low\': The 52-week low price of the stock.", "parameters": {"type": "object", "properties": {"symbol": {"type": "string"}}, "required": ["symbol"]}}}  </tools> Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|eot_id|><|start_header_id|>user<|end_header_id|>

用戶提示示例

Fetch the stock fundamentals data for Tesla (TSLA)<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模型生成的工具調用示例

<tool_call>
{"arguments": {"symbol": "TSLA"}, "name": "get_stock_fundamentals"}
</tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|>

工具響應示例

<tool_response>
{"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
</tool_response>
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

JSON模式/結構化輸出

系統提示格式

<|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|eot_id|>

給定你提供的{schema}，模型將按照該JSON格式生成響應，你只需提供一個常規的用戶提示，模型就會以JSON格式進行響應。

📚 詳細文檔

推理代碼

所有用於利用、解析和構建函數調用模板的代碼都可以在我們的GitHub倉庫中找到： https://github.com/NousResearch/Hermes-Function-Calling

量化版本

GGUF量化版本：https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-3B-Preview-GGUF

引用方式

@misc{
      title={DeepHermes 3 Preview}, 
      author={Teknium and Roger Jin and Chen Guang and Jai Suphavadeeprasit and Jeffrey Quesnelle},
      year={2025}
}