OpenHermes 2.5開源語言模型 - 免費部署，高效代碼生成與通用任務處理！

首頁

Openhermes 2.5 Mistral 7B GPTQ

由TheBloke開發

OpenHermes 2.5是基於Mistral-7B微調的高級語言模型，專注於代碼生成和通用任務處理，性能優於前代版本。

大型語言模型

Transformers

英語開源協議:Apache-2.0 #多輪對話優化 #代碼能力增強 #GPT-4級微調

下載量 695

發布時間 : 11/2/2023

模型概述

OpenHermes 2.5是由Teknium開發的Mistral-7B微調模型，通過增加代碼數據集訓練提升了多項基準測試表現，特別擅長代碼生成和複雜問題解決。

模型特點

代碼能力增強

通過額外訓練代碼數據集，humaneval分數從43%提升至50.7%，顯著提升編程問題解決能力

多基準測試提升

在TruthfulQA、AGIEval和GPT4All等非代碼基準測試中表現優異，綜合能力全面增強

ChatML格式支持

採用標準化的ChatML提示模板，便於集成到各類對話系統中

模型能力

文本生成

代碼生成

複雜問題解答

角色扮演對話

知識問答

使用案例

編程輔助

代碼生成與解釋

根據自然語言描述生成功能代碼或解釋現有代碼

humaneval pass@1達到50.7%

智能對話

個性化角色扮演

模擬特定角色或人物進行自然對話

可模擬動漫人物、歷史人物等

知識問答

複雜問題解答

回答各類知識性問題並提供詳細解釋

在AGIEval基準測試中表現優異

🚀 Openhermes 2.5 Mistral 7B - GPTQ

Openhermes 2.5 Mistral 7B - GPTQ 是一個基於 Mistral 架構的量化模型，可用於多種自然語言處理任務，如文本生成、問答等。它提供了多種量化參數選項，以適應不同的硬件和需求。

🚀 快速開始

下載模型

你可以通過以下幾種方式下載該模型：

在 text-generation-webui 中下載

從 main 分支下載，在“Download custom model or LoRA”中輸入 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ。
從其他分支下載，在下載名稱後添加 :branchname，例如 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ:gptq-4bit-32g-actorder_True。

從命令行下載

推薦使用 huggingface-hub Python 庫：

pip3 install huggingface-hub

下載 main 分支到 OpenHermes-2.5-Mistral-7B-GPTQ 文件夾：

mkdir OpenHermes-2.5-Mistral-7B-GPTQ
huggingface-cli download TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --local-dir OpenHermes-2.5-Mistral-7B-GPTQ --local-dir-use-symlinks False

從其他分支下載，添加 --revision 參數：

mkdir OpenHermes-2.5-Mistral-7B-GPTQ
huggingface-cli download TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir OpenHermes-2.5-Mistral-7B-GPTQ --local-dir-use-symlinks False

使用 `git` 下載（不推薦）

git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ

在 text-generation-webui 中使用模型

確保使用的是 text-generation-webui 的最新版本。
點擊 Model tab。
在 Download custom model or LoRA 中輸入 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ。
- 若要從特定分支下載，輸入如 TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ:gptq-4bit-32g-actorder_True。
點擊 Download。
模型下載完成後會顯示 "Done"。
點擊左上角 Model 旁邊的刷新圖標。
在 Model 下拉菜單中選擇剛下載的模型 OpenHermes-2.5-Mistral-7B-GPTQ。
模型將自動加載，即可使用。
如需自定義設置，設置後點擊 Save settings for this model 再點擊 Reload the Model。

從 Text Generation Inference (TGI) 服務使用模型

推薦使用 TGI 版本 1.1.0 或更高版本，官方 Docker 容器為：ghcr.io/huggingface/text-generation-inference:1.1.0。示例 Docker 參數：

--model-id TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

示例 Python 代碼：

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: {response}")

從 Python 代碼使用該 GPTQ 模型

安裝必要的包

pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 使用 cu117 如果是 CUDA 11.7

若安裝 AutoGPTQ 有問題，從源碼安裝：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .

使用代碼示例

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ"
# 若使用不同分支，更改 revision
# 例如: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# 也可以使用 transformers 的 pipeline 進行推理
print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

✨ 主要特性

多種量化參數選項：提供多種量化參數，可根據硬件和需求選擇最佳參數。
廣泛的兼容性：已知可在多個推理服務器和 Web UI 中使用，如 text-generation-webui、KoboldAI United 等。
高性能：在多個基準測試中表現出色，如 GPT4All、AGIEval 等。

📦 安裝指南

安裝所需的依賴包：

pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 使用 cu117 如果是 CUDA 11.7

若安裝 AutoGPTQ 有問題，從源碼安裝：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

高級用法

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ"
# 使用不同分支
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="gptq-4bit-32g-actorder_True")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

📚 詳細文檔

模型信息

屬性	詳情
模型類型	Mistral
訓練數據	主要基於 1,000,000 條 GPT - 4 生成的數據，以及來自 AI 領域開放數據集的其他高質量數據
模型創建者	Teknium
量化者	TheBloke

提示模板

使用 ChatML 提示模板：

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

已知兼容的客戶端/服務器

提供的文件和 GPTQ 參數

多個量化參數可供選擇，每個單獨的量化在不同的分支中。大多數 GPTQ 文件使用 AutoGPTQ 製作，Mistral 模型目前使用 Transformers 製作。

GPTQ 參數說明

Bits：量化模型的位大小。
GS：GPTQ 組大小。較高的數字使用較少的 VRAM，但量化精度較低。“None” 是最低可能值。
Act Order：真或假。也稱為 desc_act。真會導致更好的量化精度。一些 GPTQ 客戶端在使用 Act Order 加組大小的模型時遇到過問題，但現在一般已解決。
Damp %：影響量化樣本處理方式的 GPTQ 參數。默認值為 0.01，但 0.1 會導致稍高的精度。
GPTQ 數據集：量化期間使用的校準數據集。使用更適合模型訓練的數據集可以提高量化精度。請注意，GPTQ 校準數據集與用於訓練模型的數據集不同，請參考原始模型倉庫瞭解訓練數據集的詳細信息。
序列長度：量化使用的數據集序列長度。理想情況下，這與模型序列長度相同。對於一些非常長序列的模型（16+K），可能需要使用較低的序列長度。請注意，較低的序列長度不會限制量化模型的序列長度，它僅影響較長推理序列的量化精度。
ExLlama 兼容性：該文件是否可以使用 ExLlama 加載，目前 ExLlama 僅支持 4 位的 Llama 和 Mistral 模型。

分支	Bits	GS	Act Order	Damp %	GPTQ 數據集	Seq Len	大小	ExLlama	描述
main	4	128	Yes	0.1	wikitext	4096	4.16 GB	Yes	4 位，帶有 Act Order 和組大小 128g。比 64g 使用更少的 VRAM，但精度稍低。
gptq-4bit-32g-actorder_True	4	32	Yes	0.1	wikitext	4096	4.57 GB	Yes	4 位，帶有 Act Order 和組大小 32g。提供最高的推理質量，使用最大的 VRAM。
gptq-8bit--1g-actorder_True	8	None	Yes	0.1	wikitext	4096	4.95 GB	No	8 位，帶有 Act Order。無組大小，以降低 VRAM 需求。
gptq-8bit-128g-actorder_True	8	128	Yes	0.1	wikitext	4096	5.00 GB	No	8 位，組大小 128g 以提高推理質量，帶有 Act Order 以提高精度。
gptq-8bit-32g-actorder_True	8	32	Yes	0.1	wikitext	4096	4.97 GB	No	8 位，組大小 32g 和 Act Order 以實現最大推理質量。
gptq-4bit-64g-actorder_True	4	64	Yes	0.1	wikitext	4096	4.30 GB	Yes	4 位，帶有 Act Order 和組大小 64g。比 32g 使用更少的 VRAM，但精度稍低。