Gemma-2-2b-it-abliterated Open-source Language Model - Free Deployment to Assist Text Generation Tasks

Gemma 2 2b It Abliterated GGUF

Developed by bartowski

Gemma-2-2b-it-abliterated is a 2.2B parameter language model based on the Google Gemma architecture, optimized through quantization for text generation tasks.

Large Language Model English#Lightweight text generation #Multiple quantization version options #English conversation optimization

Downloads 10.55k

Release Time : 8/1/2024

Model Overview

This is a quantized text generation model based on Google's Gemma architecture, suitable for English text generation tasks.

Model Features

Multiple quantization versions

Offers various quantization versions from F32 to Q2_K_L to meet different hardware requirements.

High-quality quantization

Uses imatrix option for quantization and calibrates with specific datasets to ensure quantization quality.

Embedding/output weight optimization

In some quantization versions, embedding and output weights are quantized to Q8_0, potentially improving generation quality.

Model Capabilities

English text generation

Conversation generation

Content creation

Use Cases

Text generation

Dialogue systems

Used to build English dialogue systems that generate natural and fluent responses.

Content creation

Assists in generating English articles, stories, and other content.

🚀 Llamacpp imatrix Quantizations of gemma-2-2b-it-abliterated

This project provides Llama.cpp imatrix quantizations of the gemma-2-2b-it-abliterated model. It offers various quantization types to meet different performance and quality requirements, allowing users to run the model efficiently on different hardware configurations.

🚀 Quick Start

Prerequisites

Make sure you have the huggingface-cli installed. You can install it using the following command:

pip install -U "huggingface_hub[cli]"

Downloading a Specific File

To download a specific file, use the following command. For example, to download the gemma-2-2b-it-abliterated-Q4_K_M.gguf file:

huggingface-cli download bartowski/gemma-2-2b-it-abliterated-GGUF --include "gemma-2-2b-it-abliterated-Q4_K_M.gguf" --local-dir ./

Downloading Split Files

If the model is split into multiple files (models larger than 50GB), you can download all the files in a specific split to a local folder using the following command:

huggingface-cli download bartowski/gemma-2-2b-it-abliterated-GGUF --include "gemma-2-2b-it-abliterated-Q8_0/*" --local-dir ./

You can either specify a new local directory or download them all in the current directory (./).

Running the Model

You can run the quantized models in LM Studio.

✨ Features

Multiple Quantization Types: Offers a wide range of quantization types, including f32, Q8_0, Q6_K_L, etc., to balance between model quality and file size.
Embed/Output Weights Optimization: Some quantizations use Q8_0 for embeddings and output weights, which may improve model quality.
Easy Download: Supports downloading specific files or split files using the huggingface-cli.

📦 Installation

The installation mainly involves installing the huggingface-cli and downloading the desired quantized model files. Refer to the "Quick Start" section for detailed steps.

💻 Usage Examples

Prompt Format

The following is the prompt format for this model. Note that this model does not support a System prompt.

<bos><start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
<end_of_turn>
<start_of_turn>model

📚 Documentation

Model Information

Property	Details
Base Model	IlyaGusev/gemma-2-2b-it-abliterated
Language	en
License	gemma
Pipeline Tag	text-generation
Quantized By	bartowski

Downloadable Files

Filename	Quant type	File Size	Split	Description
gemma-2-2b-it-abliterated-f32.gguf	f32	10.46GB	false	Full F32 weights.
gemma-2-2b-it-abliterated-Q8_0.gguf	Q8_0	2.78GB	false	Extremely high quality, generally unneeded but max available quant.
gemma-2-2b-it-abliterated-Q6_K_L.gguf	Q6_K_L	2.29GB	false	Uses Q8_0 for embed and output weights. Very high quality, near perfect, recommended.
gemma-2-2b-it-abliterated-Q6_K.gguf	Q6_K	2.15GB	false	Very high quality, near perfect, recommended.
gemma-2-2b-it-abliterated-Q5_K_L.gguf	Q5_K_L	2.07GB	false	Uses Q8_0 for embed and output weights. High quality, recommended.
gemma-2-2b-it-abliterated-Q5_K_M.gguf	Q5_K_M	1.92GB	false	High quality, recommended.
gemma-2-2b-it-abliterated-Q5_K_S.gguf	Q5_K_S	1.88GB	false	High quality, recommended.
gemma-2-2b-it-abliterated-Q4_K_L.gguf	Q4_K_L	1.85GB	false	Uses Q8_0 for embed and output weights. Good quality, recommended.
gemma-2-2b-it-abliterated-Q4_K_M.gguf	Q4_K_M	1.71GB	false	Good quality, default size for must use cases, recommended.
gemma-2-2b-it-abliterated-Q3_K_XL.gguf	Q3_K_XL	1.69GB	false	Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.
gemma-2-2b-it-abliterated-Q4_K_S.gguf	Q4_K_S	1.64GB	false	Slightly lower quality with more space savings, recommended.
gemma-2-2b-it-abliterated-IQ4_XS.gguf	IQ4_XS	1.57GB	false	Decent quality, smaller than Q4_K_S with similar performance, recommended.
gemma-2-2b-it-abliterated-Q3_K_L.gguf	Q3_K_L	1.55GB	false	Lower quality but usable, good for low RAM availability.
gemma-2-2b-it-abliterated-IQ3_M.gguf	IQ3_M	1.39GB	false	Medium-low quality, new method with decent performance comparable to Q3_K_M.
gemma-2-2b-it-abliterated-Q2_K_L.gguf	Q2_K_L	1.37GB	false	Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.

Embed/Output Weights

Some of these quants (Q3_K_XL, Q4_K_L etc) use the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of the normal default. Some users claim that this improves the quality, while others don't notice any difference. If you use these models, please comment with your findings.

🔧 Technical Details

Quantization Process

The quantization is performed using llama.cpp release b3496. All quants are made using the imatrix option with the dataset from here.

File Selection

A great write-up with charts showing various performances is provided by Artefact2 here. To choose the appropriate file, first, determine how much RAM and/or VRAM you have. If you want the model to run as fast as possible, try to fit the whole model on your GPU's VRAM. If you want the maximum quality, add your system RAM and GPU's VRAM together and choose a quant with a file size 1 - 2GB smaller than the total.

📄 License

The model is licensed under the gemma license.

Credits

Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
Thank you ZeroWw for the inspiration to experiment with embed/output.

💡 Usage Tip

If you want to support the developer's work, you can visit the ko-fi page here: https://ko-fi.com/bartowski

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご