WizardLM-2-7B-abliterated-GGUF Open-Source AI Model - Multi-Quantization for Different Hardware Configurations

Wizardlm 2 7B Abliterated GGUF

Developed by bartowski

Llamacpp imatrix quantized version of WizardLM-2-7B-abliterated, offering multiple quantization options for different hardware configurations.

Large Language Model Open Source License:Apache-2.0 #Multiple quantization versions #Text generation optimization #Low-resource adaptation

Downloads 2,561

Release Time : 5/26/2024

Model Overview

This is a quantized version based on the WizardLM-2-7B-abliterated model, using llama.cpp for quantization, suitable for text generation tasks.

Model Features

Multiple quantization options

Offers various quantization versions from Q8_0 to IQ1_S, catering to different hardware requirements.

High-performance inference

Utilizes imatrix quantization method to optimize inference performance.

Recommended quantization versions

Q6_K, Q5_K_M, Q5_K_S, Q4_K_M, Q4_K_S, IQ4_NL, and IQ4_XS are among the recommended versions.

Model Capabilities

Text generation

Multi-turn dialogue

Instruction following

Use Cases

Dialogue systems

Intelligent assistant

Used to build intelligent conversational assistants supporting multi-turn interactions.

Content generation

Text creation

Generates articles, stories, or other creative textual content.

🚀 Llamacpp imatrix Quantizations of WizardLM-2-7B-abliterated

This project provides quantized versions of the WizardLM-2-7B-abliterated model using llama.cpp, offering various quantization types to meet different performance and quality requirements.

🚀 Quick Start

Prerequisites

First, make sure you have huggingface-cli installed:

pip install -U "huggingface_hub[cli]"

Download a Specific File

You can target the specific file you want:

huggingface-cli download bartowski/WizardLM-2-7B-abliterated-GGUF --include "WizardLM-2-7B-abliterated-Q4_K_M.gguf" --local-dir ./

Download Split Files

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download bartowski/WizardLM-2-7B-abliterated-GGUF --include "WizardLM-2-7B-abliterated-Q8_0.gguf/*" --local-dir WizardLM-2-7B-abliterated-Q8_0

You can either specify a new local-dir (e.g., WizardLM-2-7B-abliterated-Q8_0) or download them all in place (.)

✨ Features

Quantization: Using llama.cpp release b2965 for quantization.
Multiple Quantization Types: All quants are made using the imatrix option with a dataset from here.

📚 Documentation

Original Model

The original model can be found at: https://huggingface.co/fearlessdots/WizardLM-2-7B-abliterated

Prompt Format

{system_prompt} USER: {prompt} ASSISTANT: </s>

Download Options

You can download a file (not the whole branch) from the following table:

Filename	Quant type	File Size	Description
WizardLM-2-7B-abliterated-Q8_0.gguf	Q8_0	7.69GB	Extremely high quality, generally unneeded but max available quant.
WizardLM-2-7B-abliterated-Q6_K.gguf	Q6_K	5.94GB	Very high quality, near perfect, recommended.
WizardLM-2-7B-abliterated-Q5_K_M.gguf	Q5_K_M	5.13GB	High quality, recommended.
WizardLM-2-7B-abliterated-Q5_K_S.gguf	Q5_K_S	4.99GB	High quality, recommended.
WizardLM-2-7B-abliterated-Q4_K_M.gguf	Q4_K_M	4.36GB	Good quality, uses about 4.83 bits per weight, recommended.
WizardLM-2-7B-abliterated-Q4_K_S.gguf	Q4_K_S	4.14GB	Slightly lower quality with more space savings, recommended.
WizardLM-2-7B-abliterated-IQ4_NL.gguf	IQ4_NL	4.12GB	Decent quality, slightly smaller than Q4_K_S with similar performance recommended.
WizardLM-2-7B-abliterated-IQ4_XS.gguf	IQ4_XS	3.90GB	Decent quality, smaller than Q4_K_S with similar performance, recommended.
WizardLM-2-7B-abliterated-Q3_K_L.gguf	Q3_K_L	3.82GB	Lower quality but usable, good for low RAM availability.
WizardLM-2-7B-abliterated-Q3_K_M.gguf	Q3_K_M	3.51GB	Even lower quality.
WizardLM-2-7B-abliterated-IQ3_M.gguf	IQ3_M	3.28GB	Medium-low quality, new method with decent performance comparable to Q3_K_M.
WizardLM-2-7B-abliterated-IQ3_S.gguf	IQ3_S	3.18GB	Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance.
WizardLM-2-7B-abliterated-Q3_K_S.gguf	Q3_K_S	3.16GB	Low quality, not recommended.
WizardLM-2-7B-abliterated-IQ3_XS.gguf	IQ3_XS	3.01GB	Lower quality, new method with decent performance, slightly better than Q3_K_S.
WizardLM-2-7B-abliterated-IQ3_XXS.gguf	IQ3_XXS	2.82GB	Lower quality, new method with decent performance, comparable to Q3 quants.
WizardLM-2-7B-abliterated-Q2_K.gguf	Q2_K	2.71GB	Very low quality but surprisingly usable.
WizardLM-2-7B-abliterated-IQ2_M.gguf	IQ2_M	2.50GB	Very low quality, uses SOTA techniques to also be surprisingly usable.
WizardLM-2-7B-abliterated-IQ2_S.gguf	IQ2_S	2.31GB	Very low quality, uses SOTA techniques to be usable.
WizardLM-2-7B-abliterated-IQ2_XS.gguf	IQ2_XS	2.19GB	Very low quality, uses SOTA techniques to be usable.
WizardLM-2-7B-abliterated-IQ2_XXS.gguf	IQ2_XXS	1.99GB	Lower quality, uses SOTA techniques to be usable.
WizardLM-2-7B-abliterated-IQ1_M.gguf	IQ1_M	1.75GB	Extremely low quality, not recommended.
WizardLM-2-7B-abliterated-IQ1_S.gguf	IQ1_S	1.61GB	Extremely low quality, not recommended.

Choosing the Right File

A great write - up with charts showing various performances is provided by Artefact2 here

The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.

If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1 - 2GB smaller than your GPU's total VRAM.

If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1 - 2GB smaller than that total.

Next, you'll need to decide if you want to use an 'I - quant' or a 'K - quant'.

If you don't want to think too much, grab one of the K - quants. These are in format 'QX_K_X', like Q5_K_M.

If you want to get more into the weeds, you can check out this extremely useful feature chart:

llama.cpp feature matrix

But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I - quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.

These I - quants can also be used on CPU and Apple Metal, but will be slower than their K - quant equivalent, so speed vs performance is a tradeoff you'll have to decide.

The I - quants are not compatible with Vulcan, which is also AMD, so if you have an AMD card double - check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご