Mythalion-13B-GGUF Open-Source Large Language Model - Free Text Generation and Instruction Following

Mythalion 13B GGUF

Developed by TheBloke

Mythalion 13B is a 13B-parameter large language model developed by PygmalionAI, based on the Llama architecture, specializing in text generation and instruction-following tasks.

Large Language Model English#Multi-turn Dialogue Optimization #Role-play Specialized #Instruction Following

Downloads 2,609

Release Time : 9/5/2023

Model Overview

This is a 13B-parameter instruction-following large language model that supports English text generation, suitable for various scenarios such as dialogue, creative writing, and Q&A.

Model Features

Multi-dataset Training

Trained on multiple high-quality datasets including PIPPA and OpenOrca

Instruction Following

Specially optimized for understanding and executing user instructions

Quantization Support

Provides multiple quantized versions to accommodate different hardware requirements

Model Capabilities

Text Generation

Instruction Understanding

Multi-turn Dialogue

Story Creation

Q&A System

Use Cases

Creative Writing

Story Generation

Generates coherent story content based on user prompts

Dialogue System

Role-play Dialogue

Simulates specific character's dialogue style and behavior patterns

🚀 Mythalion 13B - GGUF

This repository provides GGUF format model files for Mythalion 13B, a text - generation model. It offers various quantized models suitable for different use - cases and hardware setups.

🚀 Quick Start

Downloading the Model

Using Clients/Libraries: Tools like LM Studio, LoLLMS Web UI, and Faraday.dev can automatically download models. They present a list of available models for you to choose from.
In text - generation - webui: Under "Download Model", enter the model repo TheBloke/Mythalion - 13B - GGUF and specify a filename (e.g., mythalion - 13b.q4_K_M.gguf), then click "Download".
On the Command Line:
- First, install the huggingface - hub Python library:

pip3 install huggingface - hub>=0.17.1

- Then download an individual model file:

huggingface - cli download TheBloke/Mythalion - 13B - GGUF mythalion - 13b.q4_K_M.gguf --local - dir. --local - dir - use - symlinks False

Running the Model

Example `llama.cpp` command

Ensure you are using llama.cpp from commit d0cee0d36d5be95a0d9088b674dbb27354107221 or later.

./main -ngl 32 -m mythalion - 13b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:"

Adjust -ngl 32 according to the number of layers you want to offload to the GPU. Remove it if you don't have GPU acceleration.
Modify -c 4096 to set the desired sequence length. For extended sequence models, llama.cpp automatically reads and sets the necessary RoPE scaling parameters from the GGUF file.
To have a chat - style conversation, replace the -p <PROMPT> argument with -i -ins.

Running in `text - generation - webui`

Refer to [text - generation - webui/docs/llama.cpp.md](https://github.com/oobabooga/text - generation - webui/blob/main/docs/llama.cpp.md) for further instructions.

Running from Python code

You can use GGUF models from Python with the [llama - cpp - python](https://github.com/abetlen/llama - cpp - python) or ctransformers libraries.

✨ Features

Multiple Quantization Options: Offers a range of quantized models (e.g., Q2_K, Q3_K, etc.) to balance between model size and quality.
Broad Compatibility: Compatible with many clients and libraries such as llama.cpp, text - generation - webui, KoboldCpp, etc.
Flexible Usage: Can be used for both instruction - following tasks and chat - style conversations.

📦 Installation

Installing Dependencies for Download

To download models on the command line, you need to install the huggingface - hub Python library:

pip3 install huggingface - hub>=0.17.1

To accelerate downloads on fast connections, install hf_transfer:

pip3 install hf_transfer

Installing Dependencies for Python Usage

If you want to use the model from Python:

For ctransformers without GPU acceleration:

pip install ctransformers>=0.2.24

With CUDA GPU acceleration:

pip install ctransformers[cuda]>=0.2.24

With ROCm GPU acceleration:

CT_HIPBLAS = 1 pip install ctransformers

💻 Usage Examples

Basic Usage in `llama.cpp`

./main -ngl 32 -m mythalion - 13b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:"

Advanced Usage - Chat - style Conversation in `llama.cpp`

./main -ngl 32 -m mythalion - 13b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -i -ins

📚 Documentation

About GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st, 2023. It replaces GGML, which is no longer supported by llama.cpp. GGUF has several advantages over GGML, including better tokenization, support for special tokens, metadata support, and extensibility.

Here is a list of clients and libraries known to support GGUF:

llama.cpp: The source project for GGUF, offering a CLI and a server option.
[text - generation - webui](https://github.com/oobabooga/text - generation - webui): A widely used web UI with many features and powerful extensions, supporting GPU acceleration.
KoboldCpp: A fully - featured web UI with GPU acceleration across all platforms and GPU architectures, great for story - telling.
LM Studio: An easy - to - use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
[LoLLMS Web UI](https://github.com/ParisNeo/lollms - webui): A web UI with many interesting and unique features, including a full model library for easy model selection.
Faraday.dev: An attractive and easy - to - use character - based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
ctransformers: A Python library with GPU acceleration, LangChain support, and an OpenAI - compatible AI server.
[llama - cpp - python](https://github.com/abetlen/llama - cpp - python): A Python library with GPU acceleration, LangChain support, and an OpenAI - compatible API server.
candle: A Rust ML framework focusing on performance, including GPU support and ease of use.

Repositories available

[AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Mythalion - 13B - AWQ)
[GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Mythalion - 13B - GPTQ)
[2, 3, 4, 5, 6 and 8 - bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF)
[PygmalionAI's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/PygmalionAI/mythalion - 13b)

Prompt template

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

Compatibility

These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit d0cee0d36d5be95a0d9088b674dbb27354107221. They are also compatible with many third - party UIs and libraries; refer to the list at the top of this README.

Explanation of quantisation methods

Click to see details

The new methods available are:

GGML_TYPE_Q2_K: "type - 1" 2 - bit quantization in super - blocks containing 16 blocks, each block having 16 weights. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw).
GGML_TYPE_Q3_K: "type - 0" 3 - bit quantization in super - blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This ends up using 3.4375 bpw.
GGML_TYPE_Q4_K: "type - 1" 4 - bit quantization in super - blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
GGML_TYPE_Q5_K: "type - 1" 5 - bit quantization. Same super - block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw.
GGML_TYPE_Q6_K: "type - 0" 6 - bit quantization. Super - blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw.

Refer to the Provided Files table below to see what files use which methods, and how.

Provided files

Name	Quant method	Bits	Size	Max RAM required	Use case
[mythalion - 13b.Q2_K.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q2_K.gguf)	Q2_K	2	5.43 GB	7.93 GB	smallest, significant quality loss - not recommended for most purposes
[mythalion - 13b.Q3_K_S.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q3_K_S.gguf)	Q3_K_S	3	5.66 GB	8.16 GB	very small, high quality loss
[mythalion - 13b.Q3_K_M.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q3_K_M.gguf)	Q3_K_M	3	6.34 GB	8.84 GB	very small, high quality loss
[mythalion - 13b.Q3_K_L.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q3_K_L.gguf)	Q3_K_L	3	6.93 GB	9.43 GB	small, substantial quality loss
[mythalion - 13b.Q4_0.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q4_0.gguf)	Q4_0	4	7.37 GB	9.87 GB	legacy; small, very high quality loss - prefer using Q3_K_M
[mythalion - 13b.Q4_K_S.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q4_K_S.gguf)	Q4_K_S	4	7.41 GB	9.91 GB	small, greater quality loss
[mythalion - 13b.Q4_K_M.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q4_K_M.gguf)	Q4_K_M	4	7.87 GB	10.37 GB	medium, balanced quality - recommended
[mythalion - 13b.Q5_0.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q5_0.gguf)	Q5_0	5	8.97 GB	11.47 GB	legacy; medium, balanced quality - prefer using Q4_K_M
[mythalion - 13b.Q5_K_S.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q5_K_S.gguf)	Q5_K_S	5	8.97 GB	11.47 GB	large, low quality loss - recommended
[mythalion - 13b.Q5_K_M.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q5_K_M.gguf)	Q5_K_M	5	9.23 GB	11.73 GB	large, very low quality loss - recommended
[mythalion - 13b.Q6_K.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q6_K.gguf)	Q6_K	6	10.68 GB	13.18 GB	very large, extremely low quality loss
[mythalion - 13b.Q8_0.gguf](https://huggingface.co/TheBloke/Mythalion - 13B - GGUF/blob/main/mythalion - 13b.Q8_0.gguf)	Q8_0	8	13.83 GB	16.33 GB	very large, extremely low quality loss - not recommended

Note: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.

🔧 Technical Details

The GGUF format is designed to be more efficient and feature - rich compared to the deprecated GGML format. It allows for better tokenization, support for special tokens, and metadata storage. Different quantisation methods are used to balance between model size and quality. For example, lower - bit quantisation methods like Q2_K result in smaller model sizes but may have significant quality loss, while higher - bit methods like Q6_K and Q8_0 have extremely low quality loss but larger model sizes.

📄 License

The model is licensed under the llama2 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご