đ Gemma 3 Quantized Models
This repository offers W4A16 quantized versions of Google's Gemma 3 instruction - tuned models. These quantized models enhance accessibility for deployment on consumer hardware without sacrificing much performance.
đ Quick Start
To use the models with vLLM, you can run the following command:
vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py
⨠Features
- Quantized Models: The repository provides W4A16 quantized versions of Gemma 3 instruction - tuned models, including
abhishekchohan/gemma-3-27b-it-quantized-W4A16
, abhishekchohan/gemma-3-12b-it-quantized-W4A16
, and abhishekchohan/gemma-3-4b-it-quantized-W4A16
.
- Optimized for Consumer Hardware: These models are more accessible for deployment on consumer hardware while maintaining good performance.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py
đ Documentation
Models
- abhishekchohan/gemma-3-27b-it-quantized-W4A16
- abhishekchohan/gemma-3-12b-it-quantized-W4A16
- abhishekchohan/gemma-3-4b-it-quantized-W4A16
Repository Structure
gemma-3-{size}-it-quantized-W4A16/
âââ README.md
âââ templates/
â âââ chat_template.jinja
âââ tools/
â âââ tool_parser.py
âââ [model files]
Quantization Details
These models use W4A16 quantization via LLM Compressor:
- Weights quantized to 4 - bit precision
- Activations use 16 - bit precision
- Significantly reduced memory requirements
đ§ Technical Details
The models use W4A16 quantization through LLM Compressor. By quantizing the weights to 4 - bit precision and keeping the activations at 16 - bit precision, the memory requirements are significantly reduced, which makes the models more suitable for deployment on consumer hardware.
đ License
These models are subject to the Gemma license. Users must acknowledge and accept the license terms before using the models. To access Gemma on Hugging Face, you're required to review and agree to Google's usage license. To do this, please ensure you're logged in to Hugging Face and click the "Acknowledge license" button. Requests are processed immediately.
đ Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}