Phi-3.5-mini-instruct_Uncensored-GGUF Open-Source Language Model - Suitable for Multiple Hardware, Free to Use!

Phi 3.5 Mini Instruct Uncensored GGUF

Developed by bartowski

Phi-3.5-mini-instruct_Uncensored is a quantized language model suitable for use under various hardware conditions.

Large Language Model Open Source License:Apache-2.0 #Efficient Quantization #Uncensored Dialogue #Lightweight Inference

Downloads 1,953

Release Time : 8/22/2024

Model Overview

This model is a quantized language model that supports text generation tasks and is suitable for use under different hardware conditions.

Model Features

Multiple Quantized Versions

Provides multiple quantized versions from Q2_K to Q8_0 to meet the usage requirements under different hardware conditions.

High-quality Quantization

Uses the imatrix option and specific datasets for quantization to ensure model quality.

Hardware Compatibility

Supports running in multiple environments such as LM Studio and is compatible with different hardware configurations.

Model Capabilities

Text Generation

Instruction Following

Multi-round Dialogue

Use Cases

General Text Generation

Dialogue System

Used to build a dialogue system, supporting multi-round dialogue and instruction following.

Generates smooth and contextually appropriate responses.

Content Creation

Used to generate content such as articles and stories.

Generates high-quality and coherent text content.

🚀 Llamacpp imatrix Quantizations of Phi-3.5-mini-instruct_Uncensored

This project provides llama.cpp imatrix quantizations of the Phi-3.5-mini-instruct_Uncensored model. It offers various quantization types to balance between model quality and resource usage.

🚀 Quick Start

Prerequisites

Ensure you have huggingface-cli installed. You can install it using the following command:

pip install -U "huggingface_hub[cli]"

Download a Model

You can download a specific model file using the huggingface-cli. For example, to download the Phi-3.5-mini-instruct_Uncensored-Q4_K_M.gguf file:

huggingface-cli download bartowski/Phi-3.5-mini-instruct_Uncensored-GGUF --include "Phi-3.5-mini-instruct_Uncensored-Q4_K_M.gguf" --local-dir ./

If the model is split into multiple files (models larger than 50GB), you can download all the relevant files using:

huggingface-cli download bartowski/Phi-3.5-mini-instruct_Uncensored-GGUF --include "Phi-3.5-mini-instruct_Uncensored-Q8_0/*" --local-dir ./

Run the Model

You can run the quantized models in LM Studio.

✨ Features

Multiple Quantization Types: Offers a wide range of quantization types (e.g., f16, Q8_0, Q6_K_L, etc.) to meet different resource and quality requirements.
Embed/Output Weights Optimization: Some quantizations use Q8_0 for embeddings and output weights, potentially improving model quality.

📦 Installation

Install huggingface-cli

pip install -U "huggingface_hub[cli]"

💻 Usage Examples

Prompt Format

<s><|system|> {system_prompt}<|end|><|user|> {prompt}<|end|><|assistant|><|end|>

📚 Documentation

Model Information

Property	Details
Base Model	SicariusSicariiStuff/Phi-3.5-mini-instruct_Uncensored
License	apache-2.0
Pipeline Tag	text-generation
Quantized By	bartowski

Model Download Table

Filename	Quant type	File Size	Split	Description
Phi-3.5-mini-instruct_Uncensored-f16.gguf	f16	7.64GB	false	Full F16 weights.
Phi-3.5-mini-instruct_Uncensored-Q8_0.gguf	Q8_0	4.06GB	false	Extremely high quality, generally unneeded but max available quant.
Phi-3.5-mini-instruct_Uncensored-Q6_K_L.gguf	Q6_K_L	3.18GB	false	Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.
Phi-3.5-mini-instruct_Uncensored-Q6_K.gguf	Q6_K	3.14GB	false	Very high quality, near perfect, recommended.
Phi-3.5-mini-instruct_Uncensored-Q5_K_L.gguf	Q5_K_L	2.88GB	false	Uses Q8_0 for embed and output weights. High quality, recommended.
Phi-3.5-mini-instruct_Uncensored-Q5_K_M.gguf	Q5_K_M	2.82GB	false	High quality, recommended.
Phi-3.5-mini-instruct_Uncensored-Q5_K_S.gguf	Q5_K_S	2.64GB	false	High quality, recommended.
Phi-3.5-mini-instruct_Uncensored-Q4_K_L.gguf	Q4_K_L	2.47GB	false	Uses Q8_0 for embed and output weights. Good quality, recommended.
Phi-3.5-mini-instruct_Uncensored-Q4_K_M.gguf	Q4_K_M	2.39GB	false	Good quality, default size for must use cases, recommended.
Phi-3.5-mini-instruct_Uncensored-Q4_K_S.gguf	Q4_K_S	2.19GB	false	Slightly lower quality with more space savings, recommended.
Phi-3.5-mini-instruct_Uncensored-Q3_K_XL.gguf	Q3_K_XL	2.17GB	false	Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.
Phi-3.5-mini-instruct_Uncensored-Q3_K_L.gguf	Q3_K_L	2.09GB	false	Lower quality but usable, good for low RAM availability.
Phi-3.5-mini-instruct_Uncensored-IQ4_XS.gguf	IQ4_XS	2.06GB	false	Decent quality, smaller than Q4_K_S with similar performance, recommended.
Phi-3.5-mini-instruct_Uncensored-Q3_K_M.gguf	Q3_K_M	1.96GB	false	Low quality.
Phi-3.5-mini-instruct_Uncensored-IQ3_M.gguf	IQ3_M	1.86GB	false	Medium-low quality, new method with decent performance comparable to Q3_K_M.
Phi-3.5-mini-instruct_Uncensored-Q3_K_S.gguf	Q3_K_S	1.68GB	false	Low quality, not recommended.
Phi-3.5-mini-instruct_Uncensored-IQ3_XS.gguf	IQ3_XS	1.63GB	false	Lower quality, new method with decent performance, slightly better than Q3_K_S.
Phi-3.5-mini-instruct_Uncensored-Q2_K_L.gguf	Q2_K_L	1.51GB	false	Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.
Phi-3.5-mini-instruct_Uncensored-Q2_K.gguf	Q2_K	1.42GB	false	Very low quality but surprisingly usable.
Phi-3.5-mini-instruct_Uncensored-IQ2_M.gguf	IQ2_M	1.32GB	false	Relatively low quality, uses SOTA techniques to be surprisingly usable.

Embed/Output Weights

Some of these quants (Q3_K_XL, Q4_K_L etc) use the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of the normal default. There is a debate on whether this improves the quality. If you use these models, please comment with your findings.

Model Selection Guide

A great write - up with charts showing various performances is provided by Artefact2 here.

To choose a model, first determine how much RAM and/or VRAM you have. If you want the model to run as fast as possible, choose a quant with a file size 1 - 2GB smaller than your GPU's total VRAM. If you want the maximum quality, add your system RAM and GPU's VRAM together and choose a quant 1 - 2GB smaller than that total.

You also need to decide between 'I - quants' (e.g., IQ3_M) and 'K - quants' (e.g., Q5_K_M). If you don't want to think too much, choose a K - quant. If you're aiming for below Q4 and running cuBLAS (Nvidia) or rocBLAS (AMD), consider I - quants, which are newer and offer better performance for their size. Note that I - quants are not compatible with Vulcan.

🔧 Technical Details

Quantization Method: Uses llama.cpp release b3600 for quantization.
Calibration Dataset: All quants are made using the imatrix option with the dataset from here.

📄 License

This project is licensed under the apache - 2.0 license.

👨‍💻 Credits

Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
Thank you ZeroWw for the inspiration to experiment with embed/output.

💡 Usage Tip

If you want to support the developer's work, visit the ko - fi page: https://ko-fi.com/bartowski

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご