Loyal-Macaroni-Maid-7B-GPTQ開源模型 - 支持角色扮演，按角色卡片設定互動

首頁

Loyal Macaroni Maid 7B GPTQ

由TheBloke開發

這是一個基於Mistral架構的7B參數模型，專注於角色扮演任務，特別設計用於遵循角色卡片設定進行互動。

大型語言模型

Transformers

#角色扮演專用 #NSFW內容支持 #低資源部署

下載量 247

發布時間 : 12/24/2023

模型概述

本項目提供了Sanji Watsuki的忠誠通心粉女僕7B模型的GPTQ量化版本，可用於高效的推理任務，在不同硬件上實現靈活部署。

模型特點

高效量化

提供多種GPTQ量化參數選項，可根據硬件和需求選擇最合適的量化模型

多平臺兼容

支持多種推理服務器和Web UI，如text-generation-webui、KoboldAI United等

角色扮演優化

專門設計用於遵循角色卡片設定進行互動，提供沉浸式角色扮演體驗

模型能力

文本生成

角色扮演

指令跟隨

使用案例

娛樂

角色扮演互動

與模型進行角色扮演對話，體驗不同的虛擬角色互動

提供沉浸式的角色扮演體驗

創意寫作

故事生成

根據提示生成連貫的故事內容

幫助作家克服創作障礙

🚀 忠誠通心粉女僕7B - GPTQ

本項目提供了Sanji Watsuki的忠誠通心粉女僕7B模型的GPTQ量化版本，可用於高效的推理任務，在不同硬件上實現靈活部署。

🚀 快速開始

下載模型

你可以通過以下幾種方式下載模型：

在text-generation-webui中下載

若要從main分支下載，在“下載模型”框中輸入TheBloke/Loyal-Macaroni-Maid-7B-GPTQ。
若要從其他分支下載，在下載名稱末尾添加:分支名，例如TheBloke/Loyal-Macaroni-Maid-7B-GPTQ:gptq-4bit-32g-actorder_True。

從命令行下載

推薦使用huggingface-hub Python庫：

pip3 install huggingface-hub

下載main分支到名為Loyal-Macaroni-Maid-7B-GPTQ的文件夾：

mkdir Loyal-Macaroni-Maid-7B-GPTQ
huggingface-cli download TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --local-dir Loyal-Macaroni-Maid-7B-GPTQ --local-dir-use-symlinks False

若要從不同分支下載，添加--revision參數：

mkdir Loyal-Macaroni-Maid-7B-GPTQ
huggingface-cli download TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir Loyal-Macaroni-Maid-7B-GPTQ --local-dir-use-symlinks False

在text-generation-webui中使用模型

點擊“模型”選項卡。
在“下載自定義模型或LoRA”處輸入TheBloke/Loyal-Macaroni-Maid-7B-GPTQ。
- 若要從特定分支下載，可輸入如TheBloke/Loyal-Macaroni-Maid-7B-GPTQ:gptq-4bit-32g-actorder_True。
- 具體分支列表可參考“提供的文件和GPTQ參數”部分。
點擊“下載”。
模型開始下載，完成後顯示“已完成”。
在左上角，點擊“模型”旁邊的刷新圖標。
在“模型”下拉菜單中，選擇剛下載的模型：Loyal-Macaroni-Maid-7B-GPTQ。
模型將自動加載，即可開始使用！
若需要自定義設置，設置完成後點擊右上角的“保存此模型的設置”，然後點擊“重新加載模型”。
- 注意，無需手動設置GPTQ參數，這些參數會從quantize_config.json文件中自動設置。
準備好後，點擊“文本生成”選項卡，輸入提示詞開始使用！

使用Text Generation Inference (TGI) 部署模型

推薦使用TGI版本1.1.0或更高版本，官方Docker容器為：ghcr.io/huggingface/text-generation-inference:1.1.0。

示例Docker參數：

--model-id TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

示例Python代碼與TGI交互（需要huggingface-hub 0.17.0或更高版本）：

pip3 install huggingface-hub

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(
  prompt_template,
  max_new_tokens=128,
  do_sample=True,
  temperature=0.7,
  top_p=0.95,
  top_k=40,
  repetition_penalty=1.1
)

print(f"Model output: {response}")

Python代碼推理示例

安裝必要的包

需要：Transformers 4.33.0或更高版本，Optimum 1.12.0或更高版本，以及AutoGPTQ 0.4.2或更高版本。

pip3 install --upgrade transformers optimum
# 如果使用PyTorch 2.1 + CUDA 12.x:
pip3 install --upgrade auto-gptq
# 或者，如果使用PyTorch 2.1 + CUDA 11.x:
pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/

如果你使用的是PyTorch 2.0，需要從源代碼安裝AutoGPTQ。同樣，如果你在使用預構建的輪子時遇到問題，也應該嘗試從源代碼構建：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.5.1
pip3 install .

示例Python代碼

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Loyal-Macaroni-Maid-7B-GPTQ"
# 若要使用不同分支，更改revision
# 例如: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Write a story about llamas"
system_message = "You are a story writing assistant"
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# 也可以使用transformers的pipeline進行推理

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

✨ 主要特性

提供多種GPTQ量化參數選項，可根據硬件和需求選擇最合適的量化模型。
支持多種推理服務器和Web UI，如text-generation-webui、KoboldAI United等。
與Transformers庫兼容，部分模型可與ExLlama兼容。

📦 安裝指南

依賴安裝

pip3 install huggingface-hub transformers optimum auto-gptq

根據不同的PyTorch和CUDA版本，可能需要調整auto-gptq的安裝方式，具體可參考上述Python代碼推理示例部分。

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "TheBloke/Loyal-Macaroni-Maid-7B-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

prompt = "你好"
prompt_template = f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, max_new_tokens=128)
print(tokenizer.decode(output[0]))

高級用法

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Loyal-Macaroni-Maid-7B-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", revision="gptq-4bit-32g-actorder_True")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "請生成一篇關於人工智能的文章"
system_message = "你是一個專業的文章生成助手"
prompt_template = f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

📚 詳細文檔

模型信息

屬性	詳情
模型創建者	Sanji Watsuki
原始模型	Loyal Macaroni Maid 7B
模型類型	Mistral
許可證	cc-by-nc-4.0
量化者	TheBloke
標籤	merge、not-for-all-audiences、nsfw

可用的倉庫

提示模板

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

已知兼容的客戶端/服務器

GPTQ模型目前支持Linux（NVidia/AMD）和Windows（僅NVidia）。macOS用戶請使用GGUF模型。

這些GPTQ模型已知可在以下推理服務器/Web UI中使用：

提供的文件和GPTQ參數

提供了多種量化參數，以便你根據硬件和需求選擇最佳參數。

每個單獨的量化模型位於不同的分支中。以下是從不同分支獲取文件的說明。

大多數GPTQ文件使用AutoGPTQ製作，Mistral模型目前使用Transformers製作。

GPTQ參數說明

比特數：量化模型的位大小。
GS：GPTQ組大小。較高的數值使用較少的VRAM，但量化精度較低。“None”是最低可能值。
Act Order：真或假。也稱為desc_act。真會導致更好的量化精度。一些GPTQ客戶端在使用Act Order和組大小的模型時遇到過問題，但現在通常已解決。
Damp %：一個影響量化樣本處理方式的GPTQ參數。默認值為0.01，但0.1會導致稍高的精度。
GPTQ數據集：量化期間使用的校準數據集。使用更適合模型訓練的數據集可以提高量化精度。請注意，GPTQ校準數據集與用於訓練模型的數據集不同 - 請參考原始模型倉庫瞭解訓練數據集的詳細信息。
序列長度：用於量化的數據集序列長度。理想情況下，這與模型序列長度相同。對於一些非常長序列的模型（16+K），可能需要使用較低的序列長度。請注意，較低的序列長度不會限制量化模型的序列長度。它僅影響較長推理序列的量化精度。
ExLlama兼容性：此文件是否可以使用ExLlama加載，目前ExLlama僅支持4位的Llama和Mistral模型。

分支	比特數	GS	Act Order	Damp %	GPTQ數據集	序列長度	大小	ExLlama	描述
main	4	128	是	0.1	OpenErotica Erotiquant	4096	4.16 GB	是	4位，帶有Act Order和組大小128g。比64g使用更少的VRAM，但精度稍低。
gptq-4bit-32g-actorder_True	4	32	是	0.1	OpenErotica Erotiquant	4096	4.57 GB	是	4位，帶有Act Order和組大小32g。提供最高的推理質量，但使用最大的VRAM。
gptq-8bit--1g-actorder_True	8	無	是	0.1	OpenErotica Erotiquant	4096	7.52 GB	否	8位，帶有Act Order。無組大小，以降低VRAM需求。
gptq-8bit-128g-actorder_True	8	128	是	0.1	OpenErotica Erotiquant	4096	7.68 GB	否	8位，帶有組大小128g以提高推理質量，帶有Act Order以獲得更高的精度。
gptq-8bit-32g-actorder_True	8	32	是	0.1	OpenErotica Erotiquant	4096	8.17 GB	否	8位，帶有組大小32g和Act Order以獲得最大的推理質量。
gptq-4bit-64g-actorder_True	4	64	是	0.1	OpenErotica Erotiquant	4096	4.29 GB	是	4位，帶有Act Order和組大小64g。比32g使用更少的VRAM，但精度稍低。

兼容性

提供的文件經測試可與Transformers一起使用。對於非Mistral模型，也可以直接使用AutoGPTQ。

ExLlama與4位的Llama架構模型（包括Mistral、Yi、DeepSeek、SOLAR等）兼容。請參閱上面的“提供的文件”表以瞭解每個文件的兼容性。

有關客戶端/服務器列表，請參閱“已知兼容的客戶端/服務器”部分。

🔧 技術細節

本項目的GPTQ量化模型通過精心選擇的參數和校準數據集，在保證推理效率的同時，儘可能提高量化精度。不同的分支提供了多種量化選項，用戶可以根據自己的硬件資源和性能需求進行選擇。

📄 許可證

本模型使用cc-by-nc-4.0許可證。

Discord

如需進一步支持，以及討論這些模型和人工智能相關話題，請加入我們的： TheBloke AI的Discord服務器

感謝與貢獻方式

感謝chirper.ai團隊！

感謝來自gpus.llm-utils.org的Clay！

很多人詢問是否可以進行貢獻。我很享受提供模型並幫助他人，也希望能夠花更多時間做這些事情，以及開展新的項目，如微調/訓練。

如果您有能力且願意貢獻，我將不勝感激，這將幫助我繼續提供更多模型，並開展新的人工智能項目。

捐贈者將在所有AI/LLM/模型問題和請求上獲得優先支持，訪問私人Discord房間，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特別感謝：Aemon Algiz。

Patreon特別提及：Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros

再次感謝所有慷慨的贊助者和捐贈者！

感謝a16z的慷慨資助！