llama-3-cat-8b-instruct-v1-GGUF Open-source Model - Suitable for resource-constrained environments, free to use!

Llama 3 Cat 8b Instruct V1 GGUF

Developed by bartowski

This is an 8B parameter instruction fine-tuned model based on Meta's Llama 3 architecture, processed with GGUF quantization, suitable for resource-constrained environments.

Large Language Model #High-precision quantization #Instruction fine-tuning #Multi-turn dialogue

Downloads 909

Release Time : 5/13/2024

Model Overview

This model is an instruction fine-tuned version of Llama 3, specifically optimized for dialogue and instruction-following tasks, offering multiple quantization versions to accommodate different hardware requirements.

Model Features

Multiple quantization versions

Offers 20 quantization versions from Q8_0 to IQ1_S, catering to different hardware configurations and performance needs.

Instruction optimization

Specifically fine-tuned for instruction-following tasks, suitable for dialogue and interactive application scenarios.

Efficient inference

Optimized with llama.cpp, enabling efficient inference even on consumer-grade hardware.

Model Capabilities

Text generation

Dialogue systems

Instruction understanding and execution

Multi-turn dialogue

Use Cases

Dialogue systems

Intelligent assistant

Build a personal assistant capable of understanding complex instructions

Can smoothly conduct multi-turn dialogues and execute tasks

Educational applications

Learning tutor

Serve as a learning aid to answer student questions

Can explain complex concepts and provide learning suggestions

🚀 Llamacpp imatrix Quantizations of llama-3-cat-8b-instruct-v1

This project focuses on the quantization of the llama-3-cat-8b-instruct-v1 model. It uses llama.cpp for quantization, providing various quantized versions with different file sizes and qualities to meet diverse user needs.

🚀 Quick Start

Prerequisites

Ensure you have huggingface-cli installed. You can install it using the following command:

pip install -U "huggingface_hub[cli]"

Download a Specific File

To download a specific file, use the following command. For example, to download llama-3-cat-8b-instruct-v1-Q4_K_M.gguf:

huggingface-cli download bartowski/llama-3-cat-8b-instruct-v1-GGUF --include "llama-3-cat-8b-instruct-v1-Q4_K_M.gguf" --local-dir ./ --local-dir-use-symlinks False

Download Split Files

If the model is larger than 50GB and split into multiple files, you can download all of them to a local folder using the following command:

huggingface-cli download bartowski/llama-3-cat-8b-instruct-v1-GGUF --include "llama-3-cat-8b-instruct-v1-Q8_0.gguf/*" --local-dir llama-3-cat-8b-instruct-v1-Q8_0 --local-dir-use-symlinks False

✨ Features

Multiple Quantization Types: Offers a wide range of quantized models, including Q8_0, Q6_K, Q5_K_M, etc., to balance between quality and file size.
Easy Download: Provides clear instructions on how to download files using huggingface-cli.
Performance Guidance: Offers guidance on choosing the appropriate quantized file based on available RAM/VRAM and performance requirements.

📦 Installation

The main installation step is to install huggingface-cli using the command:

pip install -U "huggingface_hub[cli]"

💻 Usage Examples

Download a Specific File

huggingface-cli download bartowski/llama-3-cat-8b-instruct-v1-GGUF --include "llama-3-cat-8b-instruct-v1-Q4_K_M.gguf" --local-dir ./ --local-dir-use-symlinks False

Download Split Files

huggingface-cli download bartowski/llama-3-cat-8b-instruct-v1-GGUF --include "llama-3-cat-8b-instruct-v1-Q8_0.gguf/*" --local-dir llama-3-cat-8b-instruct-v1-Q8_0 --local-dir-use-symlinks False

📚 Documentation

Prompt Format

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

File Download Table

Filename	Quant type	File Size	Description
llama-3-cat-8b-instruct-v1-Q8_0.gguf	Q8_0	8.54GB	Extremely high quality, generally unneeded but max available quant.
llama-3-cat-8b-instruct-v1-Q6_K.gguf	Q6_K	6.59GB	Very high quality, near perfect, recommended.
llama-3-cat-8b-instruct-v1-Q5_K_M.gguf	Q5_K_M	5.73GB	High quality, recommended.
llama-3-cat-8b-instruct-v1-Q5_K_S.gguf	Q5_K_S	5.59GB	High quality, recommended.
llama-3-cat-8b-instruct-v1-Q4_K_M.gguf	Q4_K_M	4.92GB	Good quality, uses about 4.83 bits per weight, recommended.
llama-3-cat-8b-instruct-v1-Q4_K_S.gguf	Q4_K_S	4.69GB	Slightly lower quality with more space savings, recommended.
llama-3-cat-8b-instruct-v1-IQ4_NL.gguf	IQ4_NL	4.67GB	Decent quality, slightly smaller than Q4_K_S with similar performance recommended.
llama-3-cat-8b-instruct-v1-IQ4_XS.gguf	IQ4_XS	4.44GB	Decent quality, smaller than Q4_K_S with similar performance, recommended.
llama-3-cat-8b-instruct-v1-Q3_K_L.gguf	Q3_K_L	4.32GB	Lower quality but usable, good for low RAM availability.
llama-3-cat-8b-instruct-v1-Q3_K_M.gguf	Q3_K_M	4.01GB	Even lower quality.
llama-3-cat-8b-instruct-v1-IQ3_M.gguf	IQ3_M	3.78GB	Medium-low quality, new method with decent performance comparable to Q3_K_M.
llama-3-cat-8b-instruct-v1-IQ3_S.gguf	IQ3_S	3.68GB	Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance.
llama-3-cat-8b-instruct-v1-Q3_K_S.gguf	Q3_K_S	3.66GB	Low quality, not recommended.
llama-3-cat-8b-instruct-v1-IQ3_XS.gguf	IQ3_XS	3.51GB	Lower quality, new method with decent performance, slightly better than Q3_K_S.
llama-3-cat-8b-instruct-v1-IQ3_XXS.gguf	IQ3_XXS	3.27GB	Lower quality, new method with decent performance, comparable to Q3 quants.
llama-3-cat-8b-instruct-v1-Q2_K.gguf	Q2_K	3.17GB	Very low quality but surprisingly usable.
llama-3-cat-8b-instruct-v1-IQ2_M.gguf	IQ2_M	2.94GB	Very low quality, uses SOTA techniques to also be surprisingly usable.
llama-3-cat-8b-instruct-v1-IQ2_S.gguf	IQ2_S	2.75GB	Very low quality, uses SOTA techniques to be usable.
llama-3-cat-8b-instruct-v1-IQ2_XS.gguf	IQ2_XS	2.60GB	Very low quality, uses SOTA techniques to be usable.
llama-3-cat-8b-instruct-v1-IQ2_XXS.gguf	IQ2_XXS	2.39GB	Lower quality, uses SOTA techniques to be usable.
llama-3-cat-8b-instruct-v1-IQ1_M.gguf	IQ1_M	2.16GB	Extremely low quality, not recommended.
llama-3-cat-8b-instruct-v1-IQ1_S.gguf	IQ1_S	2.01GB	Extremely low quality, not recommended.

Which File to Choose

A great write - up with charts showing various performances is provided by Artefact2 here.

Determine Available Resources: First, figure out how much RAM and/or VRAM you have. If you want the model to run as fast as possible, choose a quant with a file size 1 - 2GB smaller than your GPU's total VRAM. If you want the maximum quality, add your system RAM and GPU's VRAM together and choose a quant 1 - 2GB smaller than that total.
Choose between 'I - quant' and 'K - quant': If you don't want to think too much, choose a K - quant (e.g., Q5_K_M). If you're aiming for below Q4 and using cuBLAS (Nvidia) or rocBLAS (AMD), consider I - quants (e.g., IQ3_M). Note that I - quants are not compatible with Vulcan.

🔧 Technical Details

Quantization Tool: Using llama.cpp release b2854 for quantization.
Original Model: https://huggingface.co/TheSkullery/llama-3-cat-8b-instruct-v1
Quantization Option: All quants are made using the imatrix option with the dataset provided by Kalomaze here

📄 License

The model uses the llama3 license.

💡 Usage Tip

If you want to support the author's work, you can visit the ko - fi page: https://ko-fi.com/bartowski

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご