Gemma 3-12b Open-Source Large Language Model - Free Deployment, Low Memory Requirement, and Good Performance

Gemma 3 12b It Quantized W4A16

Developed by abhishekchohan

Gemma 3 is an instruction-tuned large language model developed by Google. This repository provides its 12B parameter W4A16 quantized version, significantly reducing memory requirements while maintaining good performance.

Large Language Model

Transformers

#4-bit weight quantization #instruction-tuned model #consumer hardware deployment

Downloads 1,754

Release Time : 3/17/2025

Model Overview

4-bit weight quantized version of the Gemma 3 12B instruction-tuned model, suitable for consumer hardware deployment, supporting tool invocation and dialogue tasks.

Model Features

Efficient Quantization

Utilizes W4A16 quantization technology (4-bit weights + 16-bit activations), significantly reducing memory requirements

Tool Invocation Support

Built-in tool invocation parser supporting automatic tool selection

Consumer Hardware Compatibility

Quantized version can run efficiently on consumer-grade GPUs

Model Capabilities

Instruction following

multi-turn dialogue

tool invocation

text generation

Use Cases

Dialogue Systems

Smart Assistant

Deploy as a low-resource conversational assistant

Tool Integration

API Invocation Agent

Parse natural language instructions and invoke external tools

🚀 Gemma 3 Quantized Models

This repository offers W4A16 quantized versions of Google's Gemma 3 instruction - tuned models. These quantized models enhance accessibility for deployment on consumer hardware without sacrificing much performance.

🚀 Quick Start

To use the models with vLLM, you can run the following command:

vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py

✨ Features

Quantized Models: The repository provides W4A16 quantized versions of Gemma 3 instruction - tuned models, including abhishekchohan/gemma-3-27b-it-quantized-W4A16, abhishekchohan/gemma-3-12b-it-quantized-W4A16, and abhishekchohan/gemma-3-4b-it-quantized-W4A16.
Optimized for Consumer Hardware: These models are more accessible for deployment on consumer hardware while maintaining good performance.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py

📚 Documentation

Models

abhishekchohan/gemma-3-27b-it-quantized-W4A16
abhishekchohan/gemma-3-12b-it-quantized-W4A16
abhishekchohan/gemma-3-4b-it-quantized-W4A16

Repository Structure

gemma-3-{size}-it-quantized-W4A16/
├── README.md
├── templates/
│   └── chat_template.jinja
├── tools/
│   └── tool_parser.py
└── [model files]

Quantization Details

These models use W4A16 quantization via LLM Compressor:

Weights quantized to 4 - bit precision
Activations use 16 - bit precision
Significantly reduced memory requirements

🔧 Technical Details

The models use W4A16 quantization through LLM Compressor. By quantizing the weights to 4 - bit precision and keeping the activations at 16 - bit precision, the memory requirements are significantly reduced, which makes the models more suitable for deployment on consumer hardware.

📄 License

These models are subject to the Gemma license. Users must acknowledge and accept the license terms before using the models. To access Gemma on Hugging Face, you're required to review and agree to Google's usage license. To do this, please ensure you're logged in to Hugging Face and click the "Acknowledge license" button. Requests are processed immediately.

📚 Citation

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご