模型概述
模型特點
模型能力
使用案例
🚀 DeepHermes 3 - Llama-3 3B預覽版
DeepHermes 3預覽版是Nous Research旗艦Hermes系列大語言模型(LLM)的最新版本,也是全球首批將推理能力(通過長鏈思維提升答案准確性)與常規LLM響應模式整合於單一模型的產品之一。同時,該模型在LLM註釋、判斷和函數調用方面也有顯著改進。
模型信息
屬性 | 詳情 |
---|---|
模型類型 | DeepHermes 3 - Llama-3 3B預覽版 |
基礎模型 | meta-llama/Meta-Llama-3.2-3B |
庫名稱 | transformers |
標籤
- Llama-3
- 指令微調
- 微調
- ChatML
- GPT4
- 合成數據
- 蒸餾
- 函數調用
- JSON模式
- Axolotl
- 角色扮演
- 對話
- 推理
- R1
- vLLM
示例展示
🚀 快速開始
環境準備
確保你已經安裝了必要的庫,如torch
、transformers
、flash_attn
等。
推理代碼示例
以下是使用HuggingFace Transformers進行推理的示例代碼:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time
tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")
model = AutoModelForCausalLM.from_pretrained(
"NousResearch/DeepHermes-3-Llama-3-3B-Preview",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2",
)
messages = [
{
"role": "system",
"content": "You are Hermes, an AI assistant"
},
{
"role": "user",
"content": "What are the most interesting things to do in Paris?"
}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
VLLM推理
在安裝vllm
後,你可以在終端運行以下命令來使用vLLM進行推理:
vllm serve NousResearch/DeepHermes-3-Llama-3-3B-Preview
之後,你可以使用OpenAI庫通過API調用該模型,就像調用OpenAI的API一樣。
✨ 主要特性
統一推理與常規響應模式
DeepHermes 3預覽版是一種混合推理模型,也是首批將“直覺”傳統模式響應與長鏈思維推理響應統一到單一模型中的LLM模型之一,可通過系統提示進行切換。
改進的功能
- LLM註釋、判斷和函數調用:在這些方面進行了顯著改進,提升了模型的性能和實用性。
- 高級代理能力:具有更強大的代理能力,能夠更好地完成各種任務。
- 角色扮演和推理:在角色扮演和推理方面表現出色,能夠進行更自然、準確的對話。
- 多輪對話和長上下文連貫性:支持多輪對話,並在長上下文環境下保持良好的連貫性。
以用戶為中心的設計理念
Hermes系列模型的設計理念是將LLM與用戶需求緊密結合,賦予終端用戶強大的控制和引導能力。
💻 使用示例
基礎用法
標準“直覺”響應模式
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time
tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")
model = AutoModelForCausalLM.from_pretrained(
"NousResearch/DeepHermes-3-Llama-3-3B-Preview",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2",
)
messages = [
{
"role": "system",
"content": "You are Hermes, an AI assistant"
},
{
"role": "user",
"content": "What are the most interesting things to do in Paris?"
}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
高級用法
深度思考模式
通過系統提示激活長鏈思維推理模式:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time
tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")
model = AutoModelForCausalLM.from_pretrained(
"NousResearch/DeepHermes-3-Llama-3-3B-Preview",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2",
)
messages = [
{
"role": "system",
"content": "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
},
{
"role": "user",
"content": "What is y if y=2*2-4+(3*2)"
}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
注意:對於複雜問題,DeepHermes可能會使用多達13,000個標記進行思考。因此,對於困難問題,你可能需要將max_new_tokens
設置得遠大於2500。
函數調用
系統提示格式
<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API.\\n\\n Args:\\n symbol (str): The stock symbol.\\n\\n Returns:\\n dict: A dictionary containing fundamental data.\\n Keys:\\n - \'symbol\': The stock symbol.\\n - \'company_name\': The long name of the company.\\n - \'sector\': The sector to which the company belongs.\\n - \'industry\': The industry to which the company belongs.\\n - \'market_cap\': The market capitalization of the company.\\n - \'pe_ratio\': The forward price-to-earnings ratio.\\n - \'pb_ratio\': The price-to-book ratio.\\n - \'dividend_yield\': The dividend yield.\\n - \'eps\': The trailing earnings per share.\\n - \'beta\': The beta value of the stock.\\n - \'52_week_high\': The 52-week high price of the stock.\\n - \'52_week_low\': The 52-week low price of the stock.", "parameters": {"type": "object", "properties": {"symbol": {"type": "string"}}, "required": ["symbol"]}}} </tools> Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|eot_id|><|start_header_id|>user<|end_header_id|>
用戶提示示例
Fetch the stock fundamentals data for Tesla (TSLA)<|eot_id|><|start_header_id|>assistant<|end_header_id|>
模型生成的工具調用示例
<tool_call>
{"arguments": {"symbol": "TSLA"}, "name": "get_stock_fundamentals"}
</tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|>
工具響應示例
<tool_response>
{"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
</tool_response>
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
JSON模式/結構化輸出
系統提示格式
<|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|eot_id|>
給定你提供的{schema},模型將按照該JSON格式生成響應,你只需提供一個常規的用戶提示,模型就會以JSON格式進行響應。
📚 詳細文檔
推理代碼
所有用於利用、解析和構建函數調用模板的代碼都可以在我們的GitHub倉庫中找到: https://github.com/NousResearch/Hermes-Function-Calling
量化版本
GGUF量化版本:https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-3B-Preview-GGUF
引用方式
@misc{
title={DeepHermes 3 Preview},
author={Teknium and Roger Jin and Chen Guang and Jai Suphavadeeprasit and Jeffrey Quesnelle},
year={2025}
}
📄 許可證
本模型使用Llama3許可證。



