Gemma-3 Small-Scale Thinking Model Open Sourced - Compatible with Consumer-Grade Graphics Cards, Free Deployment for Multi-Task General Inference

Gemma 3 12B FornaxV.2 QAT CoT Q4 0 GGUF

Developed by ConicCat

This is an experimental small reasoning model designed to run on 8GiB consumer-grade GPUs with general inference capabilities. Through supervised fine-tuning (SFT) and high-quality reasoning trajectory training, the model can generalize its reasoning abilities to multiple tasks.

Large Language Model #Consumer-grade GPU inference #General Chain-of-Thought #Low-memory optimization

Downloads 98

Release Time : 5/6/2025

Model Overview

Gemma Fornax is a 12B-parameter model based on Gemma 3, focusing on general reasoning capabilities applicable to coding, mathematics, and other tasks. Trained with QAT checkpoints, the model maintains quality without degradation when used under Q4_0 quantization.

Model Features

General reasoning capability

Through supervised fine-tuning (SFT) with extensive and diverse high-quality reasoning trajectories, the model can generalize reasoning abilities to a wide range of tasks, not limited to programming and mathematics.

Low-resource requirement

The model is designed to run on 8GiB consumer-grade GPUs, requiring only about 6GiB memory under Q4_0 quantization.

Thinking mode switching

Similar to Qwen 3 series models, Gemma Fornax can enable or disable thinking modes via `/think` or `/no_think` instructions in system prompts.

QAT optimization

Training based on QAT checkpoints ensures no quality degradation when the model is used under Q4_0 quantization.

Model Capabilities

Text generation

General reasoning

Mathematical problem-solving

Programming assistance

Use Cases

Education and learning

Mathematical problem-solving

The model can solve complex mathematical problems and provide detailed reasoning processes.

Generates detailed problem-solving steps through thinking mode.

Programming development

Code generation and optimization

The model can generate code snippets or optimize existing code.

Produces efficient and executable code.

Creative writing

Story generation

The model can generate creative stories or character settings.

Creates coherent and imaginative text content.

🚀 Gemma 3 12B V2 Fornax Q4_0 GGUF

This model is an experiment to create a powerful yet compact thinking model that can fit on an 8GiB consumer graphics card and possess generalizable reasoning capabilities.

📦 Model Information

Property	Details
Base Model	ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT
License	gemma
Tags	gemma3, gemma, google
Pipeline Tag	image-text-to-text
Library Name	transformers
Training Datasets	GeneralReasoning/GeneralThought-430K, Undi95/R1-RP-ShareGPT3, PJMixers-Dev/Gryphe-Aesir-RPG-Charcards-Opus-Mixed-split-v3-0324

✨ Features

This model is an experiment to try to produce a strong smaller thinking model capable of fitting in an 8GiB consumer graphics card with generalizeable reasoning capabilities. Most other open source thinking models, especially on the smaller side, fail to generalize their reasoning to tasks other than coding or math due to an overly large focus on GRPO zero for CoT which is only applicable for coding and math.

Instead of using GRPO, this model aims to SFT a wide variety of high quality, diverse reasoning traces from Deepseek R1 onto Gemma 3 to force the model to learn to effectively generalize its reasoning capabilites to a large number of tasks as an extension of the LiMO paper's approach to Math/Coding CoT. A subset of V3 O3/24 non-thinking data was also included for improved creativity and to allow the model to retain it's non-thinking capabilites.

Training off the QAT checkpoint allows for this model to be used without a drop in quality at Q4_0, requiring only ~6GiB of memory.

📚 Documentation

Thinking Mode

Similar to the Qwen 3 model line, Gemma Fornax can be used with or without thinking mode enabled.

To enable thinking place /think in the system prompt and prefill <think>\n for thinking mode.
To disable thinking put /no_think in the system prompt.

Settings

I recommend using the included sampler and template json configs for Sillytavern, as the defaults do not play well with Gemma 3 due to formatting issues.

📄 Acknowledgments

Google for open sourcing the excellent Gemma 3 model line.
Undi95 for portions of their dataset and inspiration.
PJMixers-Dev for their dataset curation and creation efforts.
GeneralReasoning for their dataset.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご