đ Gemma-3-4b Reasoning R1 Model Card
Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO for reasoning tasks, leveraging the DeepSeek-R1 methodology.
đ Quick Start
The model uses structured XML templates for dialogue and reasoning tasks. Here is a basic example to show how to use it:
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "ericrisco/gemma-3-4b-reasoning"
prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, device_map="auto", torch_dtype=torch.bfloat16
)
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
⨠Features
- Reasoning Focused: Gemma-3-4b Reasoning is a fine - tuned model designed to excel in structured, logical problem - solving and mathematical reasoning.
- Enhanced Reasoning Ability: Trained on the GSM8K dataset using GRPO, it can reason step - by - step and provide structured explanations.
- Robust CoT Capabilities: The model exhibits robust internal Chain - of - Thought (CoT) capabilities, consistently demonstrating detailed explanations and structured problem - solving skills across reasoning tasks.
đĻ Installation
The documentation does not provide specific installation steps, so this section is skipped.
đ Documentation
Model Details
Description
Gemma-3-4b Reasoning is a reasoning - focused fine - tuned model designed to excel in structured, logical problem - solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step - by - step and provide structured explanations.
Training Dataset
- GSM8K (English): Specialized dataset for mathematical and logical reasoning problems.
Intended Use
Direct Use
The model is specifically designed for structured reasoning tasks, including:
- Mathematical and logical reasoning
- Multi - step problem solving
- Instruction - based reasoning
Out-of-scope Use
This model should not be used for unethical or malicious activities that breach legal and ethical standards.
Performance
The Gemma-3-4b Reasoning model exhibits robust internal Chain - of - Thought (CoT) capabilities, consistently demonstrating detailed explanations and structured problem - solving skills across reasoning tasks.
Limitations
The model is primarily optimized for numeric and structured reasoning and might produce less accurate or unexpected results when applied to unrelated tasks.
Citations
- Gemma Multimodal Reasoning Model by Google
- GRPO Implementation by TRL
Author
Eric Risco
đ§ Technical Details
The documentation does not provide specific technical details, so this section is skipped.
đ License
The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible.
đ Model Information
Property |
Details |
Model Type |
Transformer-based language model fine - tuned with GRPO |
Training Data |
GSM8K (English) |
Base Model |
google/gemma-3-4b-it |
License |
Apache 2.0 |