Nous-Hermes-13B-GPTQ開源語言模型 - 媲美GPT-3.5-turbo的實用問答幫手

Home

Nous Hermes 13B GPTQ

Developed by TheBloke

Nous-Hermes-13B是基於超過300,000條指令微調的最先進語言模型，性能媲美GPT-3.5-turbo

大型語言模型

Transformers

EnglishOpen Source License:Other #長文本生成 #低幻覺率 #無審查機制

Downloads 718

Release Time : 6/3/2023

Model Overview

增強的Llama 13b模型，擅長長響應、低幻覺率，無審查機制，適用於各種語言任務

Model Features

大規模指令微調

基於超過300,000條GPT-4生成的指令進行微調

長響應能力

支持生成長篇連貫的文本響應

低幻覺率

相比同類模型產生更少的事實性錯誤

無審查機制

不像OpenAI模型那樣內置內容過濾

Model Capabilities

文本生成

指令跟隨

代碼生成

問答系統

內容創作

Use Cases

聊天機器人

Discord聊天機器人

可用於構建智能對話機器人

提供類似GPT-3.5-turbo的對話體驗

教育輔助

學科知識問答

回答生物學、物理學、化學等學科問題

基於Camel-AI數據集訓練的專業知識回答

代碼輔助

代碼生成與解釋

根據指令生成或解釋代碼

基於CodeAlpaca等數據集訓練的編程能力

🚀 NousResearch的Nous-Hermes-13B GPTQ

這些文件是NousResearch的Nous-Hermes-13B的GPTQ 4位模型文件。它是使用GPTQ-for-LLaMa將模型量化為4位的結果。

✨ 主要特性

提供4位GPTQ模型用於GPU推理。
提供4位、5位和8位的GGML模型用於CPU（+GPU）推理。
提供未量化的fp16格式的PyTorch模型，用於GPU推理和進一步轉換。

📦 安裝指南

在text-generation-webui中輕鬆下載和使用此模型

請確保您使用的是text-generation-webui的最新版本：

點擊模型選項卡。
在下載自定義模型或LoRA下，輸入TheBloke/Nous-Hermes-13B-GPTQ。
點擊下載。
模型將開始下載。下載完成後會顯示“已完成”。
在左上角，點擊模型旁邊的刷新圖標。
在模型下拉菜單中，選擇您剛剛下載的模型：Nous-Hermes-13B-GPTQ。
模型將自動加載，現在可以使用了！
如果您需要自定義設置，請進行設置，然後點擊右上角的保存此模型的設置，接著點擊重新加載模型。
- 請注意，您不再需要設置GPTQ參數。這些參數會從quantize_config.json文件中自動設置。
準備好後，點擊文本生成選項卡並輸入提示以開始使用！

從Python代碼使用此GPTQ模型

首先確保您已安裝AutoGPTQ：

pip install auto-gptq

然後嘗試以下示例代碼：

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

💻 使用示例

基礎用法

# 上述從Python代碼使用此GPTQ模型的示例代碼即為基礎用法示例
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

📚 詳細文檔

提示模板

該模型遵循Alpaca提示格式：

### Instruction:

### Response:

或者

### Instruction:

### Input:

### Response:

提供的文件

nous-hermes-13b-GPTQ-4bit-128g.no-act.order.safetensors 此文件適用於所有版本的GPTQ-for-LLaMa，以及AutoGPTQ。

nous-hermes-13b-GPTQ-4bit-128g.no-act.order.safetensors
- 適用於所有版本的GPTQ-for-LLaMa代碼，包括Triton和CUDA分支。
- 適用於AutoGPTQ。
- 適用於text-generation-webui一鍵安裝程序。
- 參數：分組大小 = 128。激活順序 / desc_act = 否。

原模型卡片：NousResearch的Nous-Hermes-13B

模型描述

Nous-Hermes-13b是一個經過微調的最先進語言模型，在超過300,000條指令上進行了微調。該模型由Nous Research進行微調，Teknium和Karan4D領導微調過程和數據集整理，Redmond AI贊助計算資源，還有其他幾位貢獻者參與。最終得到的增強版Llama 13b模型在各種任務上的性能可與GPT-3.5-turbo相媲美。該模型的特點是響應長、幻覺率低，並且沒有OpenAI的審查機制。微調過程在一臺配備8個A100 80GB的DGX機器上以2000的序列長度進行了超過50小時。

模型訓練

該模型幾乎完全在合成的GPT-4輸出上進行訓練。這些數據來自多種來源，如GPTeacher、通用、角色扮演v1和v2、代碼指令數據集、Nous Instruct和PDACTL（未發佈）、CodeAlpaca、Evol_Instruct Uncensored、GPT4-LLM和Unnatural Instructions。額外的數據輸入來自Camel-AI的生物學/物理學/化學和數學數據集、Airoboros的GPT-4數據集，以及更多來自CodeAlpaca的數據。總數據量包含超過300,000條指令。

合作者

模型的微調以及數據集是Teknium、Karan4D、Nous Research、Huemin Art和Redmond AI共同努力和資源協作的成果。非常感謝所有慷慨公開分享數據集的數據集創建者。特別感謝@winglian、@erhartford和@main_horse在一些訓練問題上提供的幫助。在數據集貢獻者中，GPTeacher由Teknium提供，Wizard LM由nlpxucan提供，Nous Research Instruct Dataset由Karan4D和HueminArt提供。 GPT4-LLM和Unnatural Instructions由Microsoft提供，Airoboros數據集由jondurbin提供，Camel-AI數據集來自Camel-AI，CodeAlpaca數據集由Sahil 2801提供。如果有任何人被遺漏，請在社區選項卡中開啟一個討論線程。

提示格式

該模型遵循Alpaca提示格式：

### Instruction:

### Response:

或者

### Instruction:

### Input:

### Response:

應用用例資源

有關使用Hugging Face Transformers和Discord實現來回聊天機器人的示例，請查看：https://github.com/teknium1/alpaca-discord 有關角色扮演Discord機器人的示例，請查看：https://github.com/teknium1/alpaca-roleplay-discordbot

未來計劃

該模型目前正在以FP16格式上傳，並且計劃將模型轉換為GGML和GPTQ 4位量化格式。團隊還在進行全面的基準測試，類似於對GPT4-x-Vicuna所做的那樣。我們將嘗試進行討論，以使該模型被納入GPT4All。

基準測試結果

基準測試結果即將公佈。

模型使用

該模型可在Hugging Face上下載。它適用於廣泛的語言任務，從生成創意文本到理解和遵循複雜指令。感謝我們的項目贊助商Redmond AI提供的計算資源！

📄 許可證

該模型的許可證為其他（other）。

🔗 相關鏈接

Discord

如需進一步支持，以及討論這些模型和人工智能相關話題，請加入我們的： TheBloke AI的Discord服務器

感謝與貢獻方式

感謝chirper.ai團隊！很多人問是否可以進行貢獻。我喜歡提供模型並幫助他人，也希望能夠花更多時間做這些事情，以及拓展到新的項目，如微調/訓練。如果您有能力且願意貢獻，將不勝感激，這將幫助我繼續提供更多模型，並開始新的人工智能項目。捐贈者將在所有AI/LLM/模型問題和請求上獲得優先支持，訪問私人Discord房間，以及其他福利。

Patreon：https://patreon.com/TheBlokeAI
Ko-Fi：https://ko-fi.com/TheBlokeAI

特別感謝：Aemon Algiz。

Patreon特別提及：Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter

感謝所有慷慨的贊助者和捐贈者！再次感謝a16z的慷慨資助。