🚀 AXCXEPT/EZO2.5-gemma-3-12b-it-Preview
This model integrates innovative training concepts to enhance the Japanese performance of the base model, offering a cost - effective alternative for reinforcement learning.
🚀 Quick Start
This model runs on a single A40 GPU. You can use the following commands to start using it:
Using Bash
vllm serve AXCXEPT/EZO2.5-gemma-3-12b-it-Preview --max-model-len 32768 --enforce-eager
Using Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="token-abc123",
)
prompt = """Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+rac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop."""
completion = client.chat.completions.create(
model="AXCXEPT/EZO2.5-gemma-3-12b-it-Preview",
messages=[
{"role": "user", "content": prompt}
],
temperature=0.0,
top_p=1.0,
max_tokens: 20480
)
print(completion.choices[0].message)
⚠️ Important Note
The benchmark scores are based on the inference results with temperature: 0.0, top_p: 1.0, and "max_tokens": 20480. Evaluations based on variations such as Cons@64 have not been conducted.
✨ Features
- Innovative Training Method: By integrating the “GRPO” and “PPO” concepts into the proprietary “EZO” training method, it successfully improved the Japanese performance of the base model on Japanese MT Bench and Elyza Tasks100.
- Cost - Effective: Achieved performance improvement with 3,000 training samples and two hours of training on 8 H200 GPUs, providing a low - budget alternative to complex reinforcement learning methods.
- Performance Improvement: Achieved performance improvement from the high - performing base model
google/gemma-3-12b-it
in a short time, and also showed some performance improvement comparable to 32B and 72B models.
📚 Documentation
Model Details
By integrating the recently introduced concepts of “GRPO” and “PPO” — which enable LLMs to autonomously improve their own capabilities — into our proprietary training method “EZO,” we successfully enhanced the Japanese performance of the base model on both Japanese MT Bench and Elyza Tasks100. This was achieved using only 3,000 training samples and two hours of training on 8 H200 GPUs.
While this training method is still in the research phase and requires further automation and ablation studies, we believe it represents a viable alternative to complex and time - consuming reinforcement learning approaches like GRPO/PPO — making it achievable even on a limited budget.
Bench Mark

The model achieved performance improvement from google/gemma-3-12b-it
, which originally had very high Japanese performance, through a short - term training. It also approached the performance of 32B and 72B models in some aspects, and realized specialized performance improvement with the improvement of the base model.
In the future, since more diversity in benchmarks is needed, we plan to conduct benchmarks in English with more options and carry out practical research on the training results.
📄 License
This model has been developed for research purposes. Please use it with the understanding that our company and the developers accept no responsibility for any damages resulting from its use.
👏 Special Thanks
We would like to express our sincere respect and appreciation to Google and its development team for creating the base model upon which this model is built.
📦 Model Information
Property |
Details |
Library Name |
transformers |
Model Type |
image - text - to - text |
Base Model |
google/gemma-3-12b-it |
Tags |
gemma - 3, japanese, text - generation |
License |
gemma |