Loyal-Macaroni-Maid-7B-GPTQ Open Source Model - Supports role-playing and interacts according to the character card settings

Loyal Macaroni Maid 7B GPTQ

Developed by TheBloke

This is a 7B parameter model based on the Mistral architecture, focusing on role-playing tasks and specifically designed to interact according to character card settings.

Large Language Model

Transformers

#Dedicated to role-playing #Supports NSFW content #Low-resource deployment

Downloads 247

Release Time : 12/24/2023

Model Overview

This project provides the GPTQ quantized version of Sanji Watsuki's Loyal Macaroni Maid 7B model, which can be used for efficient inference tasks and enables flexible deployment on different hardware.

Model Features

Efficient quantization

Provides multiple GPTQ quantization parameter options, allowing you to select the most suitable quantized model according to your hardware and requirements.

Multi-platform compatibility

Supports multiple inference servers and Web UIs, such as text-generation-webui, KoboldAI United, etc.

Role-playing optimization

Specifically designed to interact according to character card settings, providing an immersive role-playing experience.

Model Capabilities

Text generation

Role-playing

Instruction following

Use Cases

Entertainment

Role-playing interaction

Have role-playing conversations with the model to experience different virtual character interactions.

Provides an immersive role-playing experience.

Creative writing

Story generation

Generate coherent story content based on prompts.

Helps writers overcome creative blocks.

🚀 Loyal Macaroni Maid 7B - GPTQ

This repository contains GPTQ model files for Loyal Macaroni Maid 7B, offering multiple quantisation options for different hardware and requirements.

🚀 Quick Start

This repo provides GPTQ model files for Sanji Watsuki's Loyal Macaroni Maid 7B. You can choose the appropriate quantisation parameters based on your hardware and needs.

Model Information

Property	Details
Base Model	SanjiWatsuki/Loyal-Macaroni-Maid-7B
Model Creator	Sanji Watsuki
Model Name	Loyal Macaroni Maid 7B
Model Type	mistral
Prompt Template	```Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

{prompt}

Response:

| Quantized By | TheBloke |
| License | cc-by-nc-4.0 |
| Tags | merge, not-for-all-audiences, nsfw |

### Visual Elements
<div style="width: auto; margin-left: auto; margin-right: auto">
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
</div>
<div style="display: flex; justify-content: space-between; width: 100%;">
    <div style="display: flex; flex-direction: column; align-items: flex-start;">
        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
    </div>
    <div style="display: flex; flex-direction: column; align-items: flex-end;">
        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
    </div>
</div>
<div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">

## ✨ Features
- Multiple GPTQ parameter permutations are provided to suit different hardware and requirements.
- Compatibility with various inference servers/webuis, such as [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [KoboldAI United](https://github.com/henk717/koboldai), etc.
- Support for different branches, allowing you to choose the best quantisation for your needs.

## 📦 Installation
### Download from text-generation-webui
1. Enter `TheBloke/Loyal-Macaroni-Maid-7B-GPTQ` in the "Download model" box to download from the `main` branch.
2. To download from another branch, add `:branchname` to the end of the download name, e.g., `TheBloke/Loyal-Macaroni-Maid-7B-GPTQ:gptq-4bit-32g-actorder_True`.

### Download from the command line
1. Install the `huggingface-hub` Python library:
```shell
pip3 install huggingface-hub

Download the main branch to a folder called Loyal-Macaroni-Maid-7B-GPTQ:

mkdir Loyal-Macaroni-Maid-7B-GPTQ
huggingface-cli download TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --local-dir Loyal-Macaroni-Maid-7B-GPTQ --local-dir-use-symlinks False

To download from a different branch, add the --revision parameter:

mkdir Loyal-Macaroni-Maid-7B-GPTQ
huggingface-cli download TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir Loyal-Macaroni-Maid-7B-GPTQ --local-dir-use-symlinks False

Clone with `git` (not recommended)

git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/Loyal-Macaroni-Maid-7B-GPTQ

💻 Usage Examples

Use in text-generation-webui

Make sure you're using the latest version of text-generation-webui.
Click the Model tab.
Under Download custom model or LoRA, enter TheBloke/Loyal-Macaroni-Maid-7B-GPTQ.
- To download from a specific branch, enter for example TheBloke/Loyal-Macaroni-Maid-7B-GPTQ:gptq-4bit-32g-actorder_True.
Click Download.
Once the download is finished, click the refresh icon next to Model in the top left.
In the Model dropdown, choose the model you just downloaded: Loyal-Macaroni-Maid-7B-GPTQ.
The model will automatically load and be ready for use.
If you want custom settings, set them and then click Save settings for this model followed by Reload the Model in the top right.
Click the Text Generation tab and enter a prompt to start.

Serve from Text Generation Inference (TGI)

It's recommended to use TGI version 1.1.0 or later. The official Docker container is: ghcr.io/huggingface/text-generation-inference:1.1.0

Example Docker parameters:

--model-id TheBloke/Loyal-Macaroni-Maid-7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(
  prompt_template,
  max_new_tokens=128,
  do_sample=True,
  temperature=0.7,
  top_p=0.95,
  top_k=40,
  repetition_penalty=1.1
)

print(f"Model output: {response}")

Python code example: inference from this GPTQ model

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Loyal-Macaroni-Maid-7B-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Write a story about llamas"
system_message = "You are a story writing assistant"
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

📚 Documentation

Provided files, and GPTQ parameters

Multiple quantisation parameters are provided to allow you to choose the best one for your hardware and requirements. Each separate quant is in a different branch.

Explanation of GPTQ parameters

Bits: The bit size of the quantised model.
GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
Act Order: True or False. Also known as desc_act. True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now.
Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences.
ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama and Mistral models in 4-bit.

Branch	Bits	GS	Act Order	Damp %	GPTQ Dataset	Seq Len	Size	ExLlama	Desc
main	4	128	Yes	0.1	OpenErotica Erotiquant	4096	4.16 GB	Yes	4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy.
gptq-4bit-32g-actorder_True	4	32	Yes	0.1	OpenErotica Erotiquant	4096	4.57 GB	Yes	4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage.
gptq-8bit--1g-actorder_True	8	None	Yes	0.1	OpenErotica Erotiquant	4096	7.52 GB	No	8-bit, with Act Order. No group size, to lower VRAM requirements.
gptq-8bit-128g-actorder_True	8	128	Yes	0.1	OpenErotica Erotiquant	4096	7.68 GB	No	8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy.
gptq-8bit-32g-actorder_True	8	32	Yes	0.1	OpenErotica Erotiquant	4096	8.17 GB	No	8-bit, with group size 32g and Act Order for maximum inference quality.
gptq-4bit-64g-actorder_True	4	64	Yes	0.1	OpenErotica Erotiquant	4096	4.29 GB	Yes	4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy.

🔧 Technical Details

The files provided are tested to work with Transformers. For non-Mistral models, AutoGPTQ can also be used directly. ExLlama is compatible with Llama architecture models (including Mistral, Yi, DeepSeek, SOLAR, etc) in 4-bit.

📄 License

This project is licensed under the cc-by-nc-4.0 license.

Discord

For further support, and discussions on these models and AI in general, join us at:

TheBloke AI's Discord server

Thanks, and how to contribute

Thanks to the chirper.ai team!

Thanks to Clay from gpus.llm-utils.org!

If you're able and willing to contribute, it will be most gratefully received and will help to keep providing more models and start work on new AI projects.

Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

Special thanks to: Aemon Algiz.

Patreon special mentions: Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご