Gemma-2-9b-it-abliterated GGUF Open-Source Model - Optimized for Easy Operation in LM Studio

Gemma 2 9b It Abliterated GGUF

Developed by bartowski

A quantized version based on Gemma 2.9B, optimized using llama.cpp, suitable for running in LM Studio.

Large Language Model English#Quantization Optimization #Low-resource Deployment #English Generation

Downloads 3,941

Release Time : 4/25/2025

Model Overview

This is a quantized version of the Gemma 2.9B model, optimized with the llama.cpp tool, offering multiple quantization options to accommodate different hardware requirements.

Model Features

Multiple Quantization Options

Offers various quantization versions from F32 to Q2_K to suit different hardware and performance needs.

Optimized Embeddings and Output Weights

Some quantized versions use Q8_0 to optimize embeddings and output weights, potentially improving model quality.

ARM Optimization

Provides quantized versions specifically optimized for ARM architecture, supporting different ARM chip features.

Model Capabilities

Text Generation

Dialogue Systems

Use Cases

Dialogue Systems

Intelligent Assistant

Can be used to build intelligent dialogue assistants, supporting multi-turn conversations.

Content Generation

Text Creation

Can be used to generate various types of textual content, such as stories, articles, etc.

🚀 Llamacpp imatrix Quantizations of gemma-2-9b-it-abliterated

This project provides Llama.cpp imatrix quantizations of the gemma-2-9b-it-abliterated model. It offers various quantization types to meet different performance and quality requirements, and provides guidance on downloading and using these quantized models.

🚀 Quick Start

Prerequisites

Ensure you have huggingface-cli installed. You can install it using the following command:

pip install -U "huggingface_hub[cli]"

Downloading a Specific File

To download a specific quantized model file, use the following command. For example, to download gemma-2-9b-it-abliterated-Q4_K_M.gguf:

huggingface-cli download bartowski/gemma-2-9b-it-abliterated-GGUF --include "gemma-2-9b-it-abliterated-Q4_K_M.gguf" --local-dir ./

Downloading Split Files

If the model is split into multiple files (models larger than 50GB), you can download all the files to a local folder using the following command:

huggingface-cli download bartowski/gemma-2-9b-it-abliterated-GGUF --include "gemma-2-9b-it-abliterated-Q8_0/*" --local-dir ./

Running the Model

You can run these quantized models in LM Studio.

✨ Features

Multiple Quantization Types: Offers a wide range of quantization types, including f32, Q8_0, Q6_K_L, Q5_K_L, etc., to balance between model quality and file size.
Optimized for Different Hardware: Some quantization types are optimized for ARM chips, providing significant speed improvements.
Embed/Output Weights Option: Some quantizations use Q8_0 for embed and output weights, which may improve model quality.

📦 Installation

Installing huggingface-cli

pip install -U "huggingface_hub[cli]"

Downloading Specific Files

huggingface-cli download bartowski/gemma-2-9b-it-abliterated-GGUF --include "gemma-2-9b-it-abliterated-Q4_K_M.gguf" --local-dir ./

Downloading Split Files

huggingface-cli download bartowski/gemma-2-9b-it-abliterated-GGUF --include "gemma-2-9b-it-abliterated-Q8_0/*" --local-dir ./

💻 Usage Examples

Prompt Format

<bos><start_of_turn>system
{system_prompt}<end_of_turn>
<start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
<end_of_turn>
<start_of_turn>model

📚 Documentation

Model Information

Property	Details
Base Model	IlyaGusev/gemma-2-9b-it-abliterated
Language	en
License	gemma
Pipeline Tag	text-generation
Quantized By	bartowski

Downloadable Files

Filename	Quant type	File Size	Split	Description
gemma-2-9b-it-abliterated-f32.gguf	f32	36.97GB	false	Full F32 weights.
gemma-2-9b-it-abliterated-f32.gguf	f32	36.97GB	false	Full F32 weights.
gemma-2-9b-it-abliterated-Q8_0.gguf	Q8_0	9.83GB	false	Extremely high quality, generally unneeded but max available quant.
gemma-2-9b-it-abliterated-Q6_K_L.gguf	Q6_K_L	7.81GB	false	Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.
gemma-2-9b-it-abliterated-Q6_K.gguf	Q6_K	7.59GB	false	Very high quality, near perfect, recommended.
gemma-2-9b-it-abliterated-Q5_K_L.gguf	Q5_K_L	6.87GB	false	Uses Q8_0 for embed and output weights. High quality, recommended.
gemma-2-9b-it-abliterated-Q5_K_M.gguf	Q5_K_M	6.65GB	false	High quality, recommended.
gemma-2-9b-it-abliterated-Q5_K_S.gguf	Q5_K_S	6.48GB	false	High quality, recommended.
gemma-2-9b-it-abliterated-Q4_K_L.gguf	Q4_K_L	5.98GB	false	Uses Q8_0 for embed and output weights. Good quality, recommended.
gemma-2-9b-it-abliterated-Q4_K_M.gguf	Q4_K_M	5.76GB	false	Good quality, default size for must use cases, recommended.
gemma-2-9b-it-abliterated-Q4_K_S.gguf	Q4_K_S	5.48GB	false	Slightly lower quality with more space savings, recommended.
gemma-2-9b-it-abliterated-Q4_0.gguf	Q4_0	5.46GB	false	Legacy format, offers online repacking for ARM and AVX inference.
gemma-2-9b-it-abliterated-Q4_0_8_8.gguf	Q4_0_8_8	5.44GB	false	Optimized for ARM inference. Requires 'sve' support (see link below).
gemma-2-9b-it-abliterated-Q4_0_4_8.gguf	Q4_0_4_8	5.44GB	false	Optimized for ARM inference. Requires 'i8mm' support (see link below).
gemma-2-9b-it-abliterated-Q4_0_4_4.gguf	Q4_0_4_4	5.44GB	false	Optimized for ARM inference. Should work well on all ARM chips, pick this if you're unsure.
gemma-2-9b-it-abliterated-Q3_K_XL.gguf	Q3_K_XL	5.35GB	false	Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.
gemma-2-9b-it-abliterated-IQ4_XS.gguf	IQ4_XS	5.18GB	false	Decent quality, smaller than Q4_K_S with similar performance, recommended.
gemma-2-9b-it-abliterated-Q3_K_L.gguf	Q3_K_L	5.13GB	false	Lower quality but usable, good for low RAM availability.
gemma-2-9b-it-abliterated-Q3_K_M.gguf	Q3_K_M	4.76GB	false	Low quality.
gemma-2-9b-it-abliterated-IQ3_M.gguf	IQ3_M	4.49GB	false	Medium-low quality, new method with decent performance comparable to Q3_K_M.
gemma-2-9b-it-abliterated-Q3_K_S.gguf	Q3_K_S	4.34GB	false	Low quality, not recommended.
gemma-2-9b-it-abliterated-IQ3_XS.gguf	IQ3_XS	4.14GB	false	Lower quality, new method with decent performance, slightly better than Q3_K_S.
gemma-2-9b-it-abliterated-Q2_K_L.gguf	Q2_K_L	4.03GB	false	Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.
gemma-2-9b-it-abliterated-Q2_K.gguf	Q2_K	3.81GB	false	Very low quality but surprisingly usable.
gemma-2-9b-it-abliterated-IQ2_M.gguf	IQ2_M	3.43GB	false	Relatively low quality, uses SOTA techniques to be surprisingly usable.

Embed/Output Weights

Some of these quants (Q3_K_XL, Q4_K_L etc) use the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of the default. Some users claim this improves the quality, while others notice no difference. Please share your findings if you use these models.

Q4_0_X_X

These quantizations are optimized for ARM chips, not for Metal (Apple) offloading. They can provide a substantial speedup on ARM chips. Check the AArch64 SoC features to find the best option for your ARM chip.

File Selection

A detailed analysis with performance charts is available here. Consider your available RAM and VRAM when choosing a model. For maximum speed, select a quantized model with a file size 1-2GB smaller than your GPU's VRAM. For maximum quality, combine your system RAM and GPU's VRAM and choose a model 1-2GB smaller than the total. You can also choose between 'I-quants' and 'K-quants' based on your specific requirements.

🔧 Technical Details

Quantization Method

The quantizations are performed using llama.cpp release b3878. All quants are made using the imatrix option with a dataset from here.

ARM Optimization

The Q4_0_X_X quants are optimized for ARM chips and can provide significant speed improvements. Check the original pull request for speed comparisons.

📄 License

The project uses the gemma license.

Credits

Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
Thank you ZeroWw for the inspiration to experiment with embed/output.

If you want to support the developer's work, visit the ko-fi page.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご