Loyal-Macaroni-Maid-7B-GPTQ开源模型 - 支持角色扮演，按角色卡片设定互动

首页

Loyal Macaroni Maid 7B GPTQ

由 TheBloke 开发

这是一个基于Mistral架构的7B参数模型，专注于角色扮演任务，特别设计用于遵循角色卡片设定进行互动。

大型语言模型

Transformers

#角色扮演专用 #NSFW内容支持 #低资源部署

下载量 247

发布时间 : 12/24/2023

模型简介

本项目提供了Sanji Watsuki的忠诚通心粉女仆7B模型的GPTQ量化版本，可用于高效的推理任务，在不同硬件上实现灵活部署。

模型特点

高效量化

提供多种GPTQ量化参数选项，可根据硬件和需求选择最合适的量化模型

多平台兼容

支持多种推理服务器和Web UI，如text-generation-webui、KoboldAI United等

角色扮演优化

专门设计用于遵循角色卡片设定进行互动，提供沉浸式角色扮演体验

模型能力

文本生成

角色扮演

指令跟随

使用案例

娱乐

角色扮演互动

与模型进行角色扮演对话，体验不同的虚拟角色互动

提供沉浸式的角色扮演体验

创意写作

故事生成

根据提示生成连贯的故事内容

帮助作家克服创作障碍

🚀 忠诚通心粉女仆7B - GPTQ

本项目提供了Sanji Watsuki的忠诚通心粉女仆7B模型的GPTQ量化版本，可用于高效的推理任务，在不同硬件上实现灵活部署。

🚀 快速开始

下载模型

你可以通过以下几种方式下载模型：

在text-generation-webui中下载

若要从main分支下载，在“下载模型”框中输入TheBloke/Loyal-Macaroni-Maid-7B-GPTQ。
若要从其他分支下载，在下载名称末尾添加:分支名，例如TheBloke/Loyal-Macaroni-Maid-7B-GPTQ:gptq-4bit-32g-actorder_True。

从命令行下载

推荐使用huggingface-hub Python库：

pip3 install huggingface-hub

下载main分支到名为Loyal-Macaroni-Maid-7B-GPTQ的文件夹：

mkdir Loyal-Macaroni-Maid-7B-GPTQ
huggingface-cli download TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --local-dir Loyal-Macaroni-Maid-7B-GPTQ --local-dir-use-symlinks False

若要从不同分支下载，添加--revision参数：

mkdir Loyal-Macaroni-Maid-7B-GPTQ
huggingface-cli download TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir Loyal-Macaroni-Maid-7B-GPTQ --local-dir-use-symlinks False

在text-generation-webui中使用模型

点击“模型”选项卡。
在“下载自定义模型或LoRA”处输入TheBloke/Loyal-Macaroni-Maid-7B-GPTQ。
- 若要从特定分支下载，可输入如TheBloke/Loyal-Macaroni-Maid-7B-GPTQ:gptq-4bit-32g-actorder_True。
- 具体分支列表可参考“提供的文件和GPTQ参数”部分。
点击“下载”。
模型开始下载，完成后显示“已完成”。
在左上角，点击“模型”旁边的刷新图标。
在“模型”下拉菜单中，选择刚下载的模型：Loyal-Macaroni-Maid-7B-GPTQ。
模型将自动加载，即可开始使用！
若需要自定义设置，设置完成后点击右上角的“保存此模型的设置”，然后点击“重新加载模型”。
- 注意，无需手动设置GPTQ参数，这些参数会从quantize_config.json文件中自动设置。
准备好后，点击“文本生成”选项卡，输入提示词开始使用！

使用Text Generation Inference (TGI) 部署模型

推荐使用TGI版本1.1.0或更高版本，官方Docker容器为：ghcr.io/huggingface/text-generation-inference:1.1.0。

示例Docker参数：

--model-id TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

示例Python代码与TGI交互（需要huggingface-hub 0.17.0或更高版本）：

pip3 install huggingface-hub

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(
  prompt_template,
  max_new_tokens=128,
  do_sample=True,
  temperature=0.7,
  top_p=0.95,
  top_k=40,
  repetition_penalty=1.1
)

print(f"Model output: {response}")

Python代码推理示例

安装必要的包

需要：Transformers 4.33.0或更高版本，Optimum 1.12.0或更高版本，以及AutoGPTQ 0.4.2或更高版本。

pip3 install --upgrade transformers optimum
# 如果使用PyTorch 2.1 + CUDA 12.x:
pip3 install --upgrade auto-gptq
# 或者，如果使用PyTorch 2.1 + CUDA 11.x:
pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/

如果你使用的是PyTorch 2.0，需要从源代码安装AutoGPTQ。同样，如果你在使用预构建的轮子时遇到问题，也应该尝试从源代码构建：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.5.1
pip3 install .

示例Python代码

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Loyal-Macaroni-Maid-7B-GPTQ"
# 若要使用不同分支，更改revision
# 例如: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Write a story about llamas"
system_message = "You are a story writing assistant"
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# 也可以使用transformers的pipeline进行推理

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

✨ 主要特性

提供多种GPTQ量化参数选项，可根据硬件和需求选择最合适的量化模型。
支持多种推理服务器和Web UI，如text-generation-webui、KoboldAI United等。
与Transformers库兼容，部分模型可与ExLlama兼容。

📦 安装指南

依赖安装

pip3 install huggingface-hub transformers optimum auto-gptq

根据不同的PyTorch和CUDA版本，可能需要调整auto-gptq的安装方式，具体可参考上述Python代码推理示例部分。

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "TheBloke/Loyal-Macaroni-Maid-7B-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

prompt = "你好"
prompt_template = f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, max_new_tokens=128)
print(tokenizer.decode(output[0]))

高级用法

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Loyal-Macaroni-Maid-7B-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", revision="gptq-4bit-32g-actorder_True")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "请生成一篇关于人工智能的文章"
system_message = "你是一个专业的文章生成助手"
prompt_template = f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

📚 详细文档

模型信息

属性	详情
模型创建者	Sanji Watsuki
原始模型	Loyal Macaroni Maid 7B
模型类型	Mistral
许可证	cc-by-nc-4.0
量化者	TheBloke
标签	merge、not-for-all-audiences、nsfw

可用的仓库

提示模板

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

已知兼容的客户端/服务器

GPTQ模型目前支持Linux（NVidia/AMD）和Windows（仅NVidia）。macOS用户请使用GGUF模型。

这些GPTQ模型已知可在以下推理服务器/Web UI中使用：

提供的文件和GPTQ参数

提供了多种量化参数，以便你根据硬件和需求选择最佳参数。

每个单独的量化模型位于不同的分支中。以下是从不同分支获取文件的说明。

大多数GPTQ文件使用AutoGPTQ制作，Mistral模型目前使用Transformers制作。

GPTQ参数说明

比特数：量化模型的位大小。
GS：GPTQ组大小。较高的数值使用较少的VRAM，但量化精度较低。“None”是最低可能值。
Act Order：真或假。也称为desc_act。真会导致更好的量化精度。一些GPTQ客户端在使用Act Order和组大小的模型时遇到过问题，但现在通常已解决。
Damp %：一个影响量化样本处理方式的GPTQ参数。默认值为0.01，但0.1会导致稍高的精度。
GPTQ数据集：量化期间使用的校准数据集。使用更适合模型训练的数据集可以提高量化精度。请注意，GPTQ校准数据集与用于训练模型的数据集不同 - 请参考原始模型仓库了解训练数据集的详细信息。
序列长度：用于量化的数据集序列长度。理想情况下，这与模型序列长度相同。对于一些非常长序列的模型（16+K），可能需要使用较低的序列长度。请注意，较低的序列长度不会限制量化模型的序列长度。它仅影响较长推理序列的量化精度。
ExLlama兼容性：此文件是否可以使用ExLlama加载，目前ExLlama仅支持4位的Llama和Mistral模型。

分支	比特数	GS	Act Order	Damp %	GPTQ数据集	序列长度	大小	ExLlama	描述
main	4	128	是	0.1	OpenErotica Erotiquant	4096	4.16 GB	是	4位，带有Act Order和组大小128g。比64g使用更少的VRAM，但精度稍低。
gptq-4bit-32g-actorder_True	4	32	是	0.1	OpenErotica Erotiquant	4096	4.57 GB	是	4位，带有Act Order和组大小32g。提供最高的推理质量，但使用最大的VRAM。
gptq-8bit--1g-actorder_True	8	无	是	0.1	OpenErotica Erotiquant	4096	7.52 GB	否	8位，带有Act Order。无组大小，以降低VRAM需求。
gptq-8bit-128g-actorder_True	8	128	是	0.1	OpenErotica Erotiquant	4096	7.68 GB	否	8位，带有组大小128g以提高推理质量，带有Act Order以获得更高的精度。
gptq-8bit-32g-actorder_True	8	32	是	0.1	OpenErotica Erotiquant	4096	8.17 GB	否	8位，带有组大小32g和Act Order以获得最大的推理质量。
gptq-4bit-64g-actorder_True	4	64	是	0.1	OpenErotica Erotiquant	4096	4.29 GB	是	4位，带有Act Order和组大小64g。比32g使用更少的VRAM，但精度稍低。

兼容性

提供的文件经测试可与Transformers一起使用。对于非Mistral模型，也可以直接使用AutoGPTQ。

ExLlama与4位的Llama架构模型（包括Mistral、Yi、DeepSeek、SOLAR等）兼容。请参阅上面的“提供的文件”表以了解每个文件的兼容性。

有关客户端/服务器列表，请参阅“已知兼容的客户端/服务器”部分。

🔧 技术细节

本项目的GPTQ量化模型通过精心选择的参数和校准数据集，在保证推理效率的同时，尽可能提高量化精度。不同的分支提供了多种量化选项，用户可以根据自己的硬件资源和性能需求进行选择。

📄 许可证

本模型使用cc-by-nc-4.0许可证。

Discord

如需进一步支持，以及讨论这些模型和人工智能相关话题，请加入我们的： TheBloke AI的Discord服务器

感谢与贡献方式

感谢chirper.ai团队！

感谢来自gpus.llm-utils.org的Clay！

很多人询问是否可以进行贡献。我很享受提供模型并帮助他人，也希望能够花更多时间做这些事情，以及开展新的项目，如微调/训练。

如果您有能力且愿意贡献，我将不胜感激，这将帮助我继续提供更多模型，并开展新的人工智能项目。

捐赠者将在所有AI/LLM/模型问题和请求上获得优先支持，访问私人Discord房间，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特别感谢：Aemon Algiz。

Patreon特别提及：Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros

再次感谢所有慷慨的赞助者和捐赠者！

感谢a16z的慷慨资助！