模型简介
模型特点
模型能力
使用案例
🚀 DeepHermes 3 - Llama-3 3B预览版
DeepHermes 3预览版是Nous Research旗舰Hermes系列大语言模型(LLM)的最新版本,也是全球首批将推理能力(通过长链思维提升答案准确性)与常规LLM响应模式整合于单一模型的产品之一。同时,该模型在LLM注释、判断和函数调用方面也有显著改进。
模型信息
属性 | 详情 |
---|---|
模型类型 | DeepHermes 3 - Llama-3 3B预览版 |
基础模型 | meta-llama/Meta-Llama-3.2-3B |
库名称 | transformers |
标签
- Llama-3
- 指令微调
- 微调
- ChatML
- GPT4
- 合成数据
- 蒸馏
- 函数调用
- JSON模式
- Axolotl
- 角色扮演
- 对话
- 推理
- R1
- vLLM
示例展示
🚀 快速开始
环境准备
确保你已经安装了必要的库,如torch
、transformers
、flash_attn
等。
推理代码示例
以下是使用HuggingFace Transformers进行推理的示例代码:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time
tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")
model = AutoModelForCausalLM.from_pretrained(
"NousResearch/DeepHermes-3-Llama-3-3B-Preview",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2",
)
messages = [
{
"role": "system",
"content": "You are Hermes, an AI assistant"
},
{
"role": "user",
"content": "What are the most interesting things to do in Paris?"
}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
VLLM推理
在安装vllm
后,你可以在终端运行以下命令来使用vLLM进行推理:
vllm serve NousResearch/DeepHermes-3-Llama-3-3B-Preview
之后,你可以使用OpenAI库通过API调用该模型,就像调用OpenAI的API一样。
✨ 主要特性
统一推理与常规响应模式
DeepHermes 3预览版是一种混合推理模型,也是首批将“直觉”传统模式响应与长链思维推理响应统一到单一模型中的LLM模型之一,可通过系统提示进行切换。
改进的功能
- LLM注释、判断和函数调用:在这些方面进行了显著改进,提升了模型的性能和实用性。
- 高级代理能力:具有更强大的代理能力,能够更好地完成各种任务。
- 角色扮演和推理:在角色扮演和推理方面表现出色,能够进行更自然、准确的对话。
- 多轮对话和长上下文连贯性:支持多轮对话,并在长上下文环境下保持良好的连贯性。
以用户为中心的设计理念
Hermes系列模型的设计理念是将LLM与用户需求紧密结合,赋予终端用户强大的控制和引导能力。
💻 使用示例
基础用法
标准“直觉”响应模式
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time
tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")
model = AutoModelForCausalLM.from_pretrained(
"NousResearch/DeepHermes-3-Llama-3-3B-Preview",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2",
)
messages = [
{
"role": "system",
"content": "You are Hermes, an AI assistant"
},
{
"role": "user",
"content": "What are the most interesting things to do in Paris?"
}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
高级用法
深度思考模式
通过系统提示激活长链思维推理模式:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time
tokenizer = AutoTokenizer.from_pretrained("NousResearch/DeepHermes-3-Llama-3-3B-Preview")
model = AutoModelForCausalLM.from_pretrained(
"NousResearch/DeepHermes-3-Llama-3-3B-Preview",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2",
)
messages = [
{
"role": "system",
"content": "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
},
{
"role": "user",
"content": "What is y if y=2*2-4+(3*2)"
}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
注意:对于复杂问题,DeepHermes可能会使用多达13,000个标记进行思考。因此,对于困难问题,你可能需要将max_new_tokens
设置得远大于2500。
函数调用
系统提示格式
<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API.\\n\\n Args:\\n symbol (str): The stock symbol.\\n\\n Returns:\\n dict: A dictionary containing fundamental data.\\n Keys:\\n - \'symbol\': The stock symbol.\\n - \'company_name\': The long name of the company.\\n - \'sector\': The sector to which the company belongs.\\n - \'industry\': The industry to which the company belongs.\\n - \'market_cap\': The market capitalization of the company.\\n - \'pe_ratio\': The forward price-to-earnings ratio.\\n - \'pb_ratio\': The price-to-book ratio.\\n - \'dividend_yield\': The dividend yield.\\n - \'eps\': The trailing earnings per share.\\n - \'beta\': The beta value of the stock.\\n - \'52_week_high\': The 52-week high price of the stock.\\n - \'52_week_low\': The 52-week low price of the stock.", "parameters": {"type": "object", "properties": {"symbol": {"type": "string"}}, "required": ["symbol"]}}} </tools> Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|eot_id|><|start_header_id|>user<|end_header_id|>
用户提示示例
Fetch the stock fundamentals data for Tesla (TSLA)<|eot_id|><|start_header_id|>assistant<|end_header_id|>
模型生成的工具调用示例
<tool_call>
{"arguments": {"symbol": "TSLA"}, "name": "get_stock_fundamentals"}
</tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|>
工具响应示例
<tool_response>
{"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
</tool_response>
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
JSON模式/结构化输出
系统提示格式
<|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|eot_id|>
给定你提供的{schema},模型将按照该JSON格式生成响应,你只需提供一个常规的用户提示,模型就会以JSON格式进行响应。
📚 详细文档
推理代码
所有用于利用、解析和构建函数调用模板的代码都可以在我们的GitHub仓库中找到: https://github.com/NousResearch/Hermes-Function-Calling
量化版本
GGUF量化版本:https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-3B-Preview-GGUF
引用方式
@misc{
title={DeepHermes 3 Preview},
author={Teknium and Roger Jin and Chen Guang and Jai Suphavadeeprasit and Jeffrey Quesnelle},
year={2025}
}
📄 许可证
本模型使用Llama3许可证。



