Nous-Hermes-13B-GPTQ开源语言模型 - 媲美GPT-3.5-turbo的实用问答帮手

首页

Nous Hermes 13B GPTQ

由 TheBloke 开发

Nous-Hermes-13B是基于超过300,000条指令微调的最先进语言模型，性能媲美GPT-3.5-turbo

大型语言模型

Transformers

英语开源协议:其他 #长文本生成 #低幻觉率 #无审查机制

下载量 718

发布时间 : 6/3/2023

模型简介

增强的Llama 13b模型，擅长长响应、低幻觉率，无审查机制，适用于各种语言任务

模型特点

大规模指令微调

基于超过300,000条GPT-4生成的指令进行微调

长响应能力

支持生成长篇连贯的文本响应

低幻觉率

相比同类模型产生更少的事实性错误

无审查机制

不像OpenAI模型那样内置内容过滤

模型能力

文本生成

指令跟随

代码生成

问答系统

内容创作

使用案例

聊天机器人

Discord聊天机器人

可用于构建智能对话机器人

提供类似GPT-3.5-turbo的对话体验

教育辅助

学科知识问答

回答生物学、物理学、化学等学科问题

基于Camel-AI数据集训练的专业知识回答

代码辅助

代码生成与解释

根据指令生成或解释代码

基于CodeAlpaca等数据集训练的编程能力

🚀 NousResearch的Nous-Hermes-13B GPTQ

这些文件是NousResearch的Nous-Hermes-13B的GPTQ 4位模型文件。它是使用GPTQ-for-LLaMa将模型量化为4位的结果。

✨ 主要特性

提供4位GPTQ模型用于GPU推理。
提供4位、5位和8位的GGML模型用于CPU（+GPU）推理。
提供未量化的fp16格式的PyTorch模型，用于GPU推理和进一步转换。

📦 安装指南

在text-generation-webui中轻松下载和使用此模型

请确保您使用的是text-generation-webui的最新版本：

点击模型选项卡。
在下载自定义模型或LoRA下，输入TheBloke/Nous-Hermes-13B-GPTQ。
点击下载。
模型将开始下载。下载完成后会显示“已完成”。
在左上角，点击模型旁边的刷新图标。
在模型下拉菜单中，选择您刚刚下载的模型：Nous-Hermes-13B-GPTQ。
模型将自动加载，现在可以使用了！
如果您需要自定义设置，请进行设置，然后点击右上角的保存此模型的设置，接着点击重新加载模型。
- 请注意，您不再需要设置GPTQ参数。这些参数会从quantize_config.json文件中自动设置。
准备好后，点击文本生成选项卡并输入提示以开始使用！

从Python代码使用此GPTQ模型

首先确保您已安装AutoGPTQ：

pip install auto-gptq

然后尝试以下示例代码：

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

💻 使用示例

基础用法

# 上述从Python代码使用此GPTQ模型的示例代码即为基础用法示例
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

📚 详细文档

提示模板

该模型遵循Alpaca提示格式：

### Instruction:

### Response:

或者

### Instruction:

### Input:

### Response:

提供的文件

nous-hermes-13b-GPTQ-4bit-128g.no-act.order.safetensors 此文件适用于所有版本的GPTQ-for-LLaMa，以及AutoGPTQ。

nous-hermes-13b-GPTQ-4bit-128g.no-act.order.safetensors
- 适用于所有版本的GPTQ-for-LLaMa代码，包括Triton和CUDA分支。
- 适用于AutoGPTQ。
- 适用于text-generation-webui一键安装程序。
- 参数：分组大小 = 128。激活顺序 / desc_act = 否。

原模型卡片：NousResearch的Nous-Hermes-13B

模型描述

Nous-Hermes-13b是一个经过微调的最先进语言模型，在超过300,000条指令上进行了微调。该模型由Nous Research进行微调，Teknium和Karan4D领导微调过程和数据集整理，Redmond AI赞助计算资源，还有其他几位贡献者参与。最终得到的增强版Llama 13b模型在各种任务上的性能可与GPT-3.5-turbo相媲美。该模型的特点是响应长、幻觉率低，并且没有OpenAI的审查机制。微调过程在一台配备8个A100 80GB的DGX机器上以2000的序列长度进行了超过50小时。

模型训练

该模型几乎完全在合成的GPT-4输出上进行训练。这些数据来自多种来源，如GPTeacher、通用、角色扮演v1和v2、代码指令数据集、Nous Instruct和PDACTL（未发布）、CodeAlpaca、Evol_Instruct Uncensored、GPT4-LLM和Unnatural Instructions。额外的数据输入来自Camel-AI的生物学/物理学/化学和数学数据集、Airoboros的GPT-4数据集，以及更多来自CodeAlpaca的数据。总数据量包含超过300,000条指令。

合作者

模型的微调以及数据集是Teknium、Karan4D、Nous Research、Huemin Art和Redmond AI共同努力和资源协作的成果。非常感谢所有慷慨公开分享数据集的数据集创建者。特别感谢@winglian、@erhartford和@main_horse在一些训练问题上提供的帮助。在数据集贡献者中，GPTeacher由Teknium提供，Wizard LM由nlpxucan提供，Nous Research Instruct Dataset由Karan4D和HueminArt提供。 GPT4-LLM和Unnatural Instructions由Microsoft提供，Airoboros数据集由jondurbin提供，Camel-AI数据集来自Camel-AI，CodeAlpaca数据集由Sahil 2801提供。如果有任何人被遗漏，请在社区选项卡中开启一个讨论线程。

提示格式

该模型遵循Alpaca提示格式：

### Instruction:

### Response:

或者

### Instruction:

### Input:

### Response:

应用用例资源

有关使用Hugging Face Transformers和Discord实现来回聊天机器人的示例，请查看：https://github.com/teknium1/alpaca-discord 有关角色扮演Discord机器人的示例，请查看：https://github.com/teknium1/alpaca-roleplay-discordbot

未来计划

该模型目前正在以FP16格式上传，并且计划将模型转换为GGML和GPTQ 4位量化格式。团队还在进行全面的基准测试，类似于对GPT4-x-Vicuna所做的那样。我们将尝试进行讨论，以使该模型被纳入GPT4All。

基准测试结果

基准测试结果即将公布。

模型使用

该模型可在Hugging Face上下载。它适用于广泛的语言任务，从生成创意文本到理解和遵循复杂指令。感谢我们的项目赞助商Redmond AI提供的计算资源！

📄 许可证

该模型的许可证为其他（other）。

🔗 相关链接

Discord

如需进一步支持，以及讨论这些模型和人工智能相关话题，请加入我们的： TheBloke AI的Discord服务器

感谢与贡献方式

感谢chirper.ai团队！很多人问是否可以进行贡献。我喜欢提供模型并帮助他人，也希望能够花更多时间做这些事情，以及拓展到新的项目，如微调/训练。如果您有能力且愿意贡献，将不胜感激，这将帮助我继续提供更多模型，并开始新的人工智能项目。捐赠者将在所有AI/LLM/模型问题和请求上获得优先支持，访问私人Discord房间，以及其他福利。

Patreon：https://patreon.com/TheBlokeAI
Ko-Fi：https://ko-fi.com/TheBlokeAI

特别感谢：Aemon Algiz。

Patreon特别提及：Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter

感谢所有慷慨的赞助者和捐赠者！再次感谢a16z的慷慨资助。