Gemma-3-12B-IT-Q5_K_M-GGUF Open Source Model - Adapted to the llama.cpp Framework for Free Deployment and Use

Gemma 3 12b It Q5 K M GGUF

Developed by NikolayKozloff

This is a GGUF format model converted from google/gemma-3-12b-it, suitable for the llama.cpp framework.

Large Language Model #Quantitative Efficient Inference #Multi-turn Dialogue Optimization #Lightweight Deployment

Downloads 46

Release Time : 3/12/2025

Model Overview

A GGUF format version based on the Google Gemma 3.12B instruction-tuned model, primarily used for text generation tasks.

Model Features

GGUF Format Support

Converted to GGUF format for easy use within the llama.cpp ecosystem.

Quantized Version

Provides Q5_K_M quantization level, balancing model accuracy and inference efficiency.

Instruction Tuning

Based on the instruction-tuned version, better suited for dialogue and instruction-following tasks.

Model Capabilities

Text Generation

Dialogue Systems

Instruction Following

Use Cases

Dialogue Systems

Open-domain Dialogue

Can be used to build chatbots for open-domain conversations.

Content Generation

Creative Writing

Assists in creative text generation such as story writing and poetry composition.

🚀 NikolayKozloff/gemma-3-12b-it-Q5_K_M-GGUF

This model is converted to GGUF format from the original model, aiming to provide more convenient usage and compatibility.

🚀 Quick Start

This model was converted to GGUF format from google/gemma-3-12b-it using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

✨ Features

Model Source: Converted from google/gemma-3-12b-it.
Format: GGUF format, which can be used with llama.cpp.
Access Requirement: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click the "Acknowledge license" button. Requests are processed immediately.

📦 Installation

Install llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

💻 Usage Examples

Use with llama.cpp

CLI:

llama-cli --hf-repo NikolayKozloff/gemma-3-12b-it-Q5_K_M-GGUF --hf-file gemma-3-12b-it-q5_k_m.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo NikolayKozloff/gemma-3-12b-it-Q5_K_M-GGUF --hf-file gemma-3-12b-it-q5_k_m.gguf -c 2048

Alternative Usage Steps

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo NikolayKozloff/gemma-3-12b-it-Q5_K_M-GGUF --hf-file gemma-3-12b-it-q5_k_m.gguf -p "The meaning to life and the universe is"

./llama-server --hf-repo NikolayKozloff/gemma-3-12b-it-Q5_K_M-GGUF --hf-file gemma-3-12b-it-q5_k_m.gguf -c 2048

📄 License

This model uses the Gemma license.

📚 Documentation

Property	Details
Base Model	google/gemma-3-12b-it
Library Name	transformers
License	gemma
Pipeline Tag	image-text-to-text
Tags	llama-cpp, gguf-my-repo

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご