🚀 Gemma-2-Llama-Swallow
The Gemma-2-Llama-Swallow series enhances language capabilities by continual pre-training on Gemma 2 models, excelling in both English and Japanese.
🚀 Quick Start
The Gemma-2-Llama-Swallow series was built by continual pre-training on the gemma-2 models. Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities. We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training. The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese. See the Swallow Model Index section to find other model variants. Built with Gemma. Built with Llama.
✨ Features
- Continual pre-training on Gemma 2 models to enhance Japanese language capabilities while maintaining English proficiency.
- Instruction-tuned models built on synthetic data for Japanese, improving performance in Japanese tasks.
📚 Documentation
Release History
Swallow Model Index

The website https://swallow-llm.github.io/ provides large language models developed by the Swallow team.
Model Details
Property |
Details |
Model Type |
Please refer to Gemma 2 paper for details on the model architecture. |
Language(s) |
Japanese, English |
Library |
maxtext |
Tokenizer |
Please refer to Gemma 2 paper for details on the tokenizer. |
Contact |
swallow[at]nlp.c.titech.ac.jp |
Model Performance
MT-Bench JA
Model |
coding |
extraction |
humanities |
math |
reasoning |
roleplay |
stem |
writing |
JMT Avg |
google/gemma-3-1b-it |
0.379 |
0.497 |
0.680 |
0.385 |
0.322 |
0.628 |
0.540 |
0.651 |
0.510 |
Qwen/Qwen2.5-1.5B-Instruct |
0.408 |
0.513 |
0.456 |
0.527 |
0.352 |
0.473 |
0.406 |
0.469 |
0.450 |
google/gemma-2-2b-it |
0.454 |
0.587 |
0.693 |
0.524 |
0.445 |
0.654 |
0.567 |
0.630 |
0.569 |
rinna/gemma-2-baku-2b-it |
0.470 |
0.625 |
0.810 |
0.414 |
0.382 |
0.713 |
0.609 |
0.697 |
0.590 |
google/gemma-2-2b-jpn-it |
0.467 |
0.488 |
0.741 |
0.379 |
0.406 |
0.660 |
0.589 |
0.672 |
0.550 |
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1 |
0.438 |
0.533 |
0.781 |
0.557 |
0.404 |
0.706 |
0.674 |
0.682 |
0.597 |
Qwen/Qwen2.5-3B-Instruct |
0.567 |
0.647 |
0.597 |
0.665 |
0.457 |
0.649 |
0.526 |
0.637 |
0.593 |
google/gemma-3-4b-it |
0.603 |
0.724 |
0.798 |
0.767 |
0.498 |
0.803 |
0.775 |
0.822 |
0.724 |
Qwen/Qwen2.5-7B-Instruct |
0.599 |
0.741 |
0.719 |
0.637 |
0.541 |
0.744 |
0.624 |
0.713 |
0.665 |
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 |
0.562 |
0.756 |
0.869 |
0.610 |
0.512 |
0.783 |
0.748 |
0.803 |
0.705 |
google/gemma-2-9b-it |
0.652 |
0.765 |
0.857 |
0.614 |
0.673 |
0.811 |
0.713 |
0.800 |
0.736 |
tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1 |
0.592 |
0.796 |
0.872 |
0.742 |
0.638 |
0.802 |
0.745 |
0.803 |
0.749 |
google/gemma-3-12b-it |
0.807 |
0.814 |
0.871 |
0.886 |
0.623 |
0.847 |
0.858 |
0.863 |
0.821 |
google/gemma-2-27b-it |
0.727 |
0.809 |
0.874 |
0.719 |
0.639 |
0.810 |
0.740 |
0.826 |
0.768 |
tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1 |
0.618 |
0.839 |
0.873 |
0.741 |
0.608 |
0.814 |
0.739 |
0.836 |
0.759 |
google/gemma-3-27b-it |
0.804 |
0.927 |
0.879 |
0.876 |
0.774 |
0.846 |
0.848 |
0.882 |
0.855 |
Qwen/Qwen2.5-32B-Instruct |
0.724 |
0.885 |
0.816 |
0.918 |
0.726 |
0.834 |
0.763 |
0.808 |
0.809 |
Japanese tasks
Model |
JCom. |
JEMHopQA |
NIILC |
JSQuAD |
XL-Sum |
MGSM |
WMT20-en-ja |
WMT20-ja-en |
JMMLU |
JHumanEval |
Ja Avg |
|
4-shot |
4-shot |
4-shot |
4-shot |
1-shot |
4-shot |
4-shot |
4-shot |
5-shot |
0-shot |
|
|
EM acc |
Char-F1 |
Char-F1 |
Char-F1 |
ROUGE-2 |
EM acc |
BLEU |
BLEU |
EM acc |
pass@1 |
|
google/gemma-3-1b-it |
0.526 |
0.330 |
0.237 |
0.700 |
0.113 |
0.088 |
0.166 |
0.115 |
0.332 |
0.245 |
0.285 |
Qwen/Qwen2.5-1.5B-Instruct |
0.812 |
0.276 |
0.241 |
0.847 |
0.128 |
0.292 |
0.147 |
0.119 |
0.447 |
0.242 |
0.355 |
google/gemma-2-2b-it |
0.862 |
0.348 |
0.315 |
0.879 |
0.117 |
0.252 |
0.207 |
0.183 |
0.437 |
0.321 |
0.392 |
rinna/gemma-2-baku-2b-it |
0.855 |
0.228 |
0.390 |
0.877 |
0.115 |
0.172 |
0.255 |
0.190 |
0.415 |
0.165 |
0.366 |
google/gemma-2-2b-jpn-it |
0.845 |
0.321 |
0.291 |
0.877 |
0.131 |
0.192 |
0.204 |
0.180 |
0.418 |
0.311 |
0.377 |
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1 |
0.862 |
0.367 |
0.483 |
0.881 |
0.145 |
0.288 |
0.258 |
0.200 |
0.485 |
0.267 |
0.424 |
Qwen/Qwen2.5-3B-Instruct |
0.876 |
0.304 |
0.293 |
0.866 |
0.144 |
0.228 |
0.198 |
0.168 |
0.536 |
0.474 |
0.409 |
google/gemma-3-4b-it |
0.818 |
0.444 |
0.404 |
0.801 |
0.134 |
0.332 |
0.217 |
0.169 |
0.477 |
0.365 |
0.416 |
Qwen/Qwen2.5-7B-Instruct |
0.915 |
0.429 |
0.391 |
0.891 |
0.168 |
0.632 |
0.211 |
0.192 |
0.623 |
0.532 |
0.498 |
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 |
0.924 |
0.528 |
0.583 |
0.896 |
0.191 |
0.532 |
0.281 |
0.229 |
0.544 |
0.394 |
0.510 |
google/gemma-2-9b-it |
0.931 |
0.532 |
0.527 |
0.876 |
0.149 |
0.636 |
0.273 |
0.239 |
0.623 |
0.559 |
0.535 |
tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1 |
0.946 |
0.606 |
0.643 |
0.852 |
0.170 |
0.624 |
0.296 |
0.238 |
0.639 |
0.446 |
0.546 |
google/gemma-3-12b-it |
0.935 |
0.566 |
0.542 |
0.808 |
0.148 |
0.724 |
0.289 |
0.239 |
0.645 |
0.637 |
0.553 |
google/gemma-2-27b-it |
0.956 |
0.541 |
0.576 |
0.883 |
0.166 |
0.704 |
0.290 |
0.249 |
0.670 |
0.638 |
0.567 |
tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1 |
0.969 |
0.654 |
0.658 |
0.891 |
0.194 |
0.764 |
0.316 |
0.258 |
0.686 |
0.635 |
0.602 |
google/gemma-3-27b-it |
0.946 |
0.592 |
0.584 |
0.867 |
0.142 |
0.764 |
0.307 |
0.253 |
0.716 |
0.736 |
0.591 |
Qwen/Qwen2.5-32B-Instruct |
0.959 |
0.567 |
0.497 |
0.903 |
0.169 |
0.780 |
0.228 |
0.195 |
0.757 |
0.651 |
0.571 |
English tasks
Model |
OpenBookQA |
TriviaQA |
HellaSWAG |
SQuAD2.0 |
XWINO |
MMLU |
GSM8K |
MATH |
BBH |
HumanEval |
En Avg |
|
4-shot |
4-shot |
4-shot |
4-shot |
4-shot |
5-shot |
4-shot |
4-shot |
3-shot |
0-shot |
|
|
Acc |
EM acc |
Acc |
EM acc |
Acc |
Acc |
EM acc |
CoT EM Acc |
CoT EM Acc |
pass@1 |
|
google/gemma-3-1b-it |
0.272 |
0.229 |
0.421 |
0.501 |
0.786 |
0.398 |
0.256 |
0.340 |
0.379 |
0.335 |
0.392 |
Qwen/Qwen2.5-1.5B-Instruct |
0.334 |
0.378 |
0.503 |
0.501 |
0.844 |
0.604 |
0.257 |
0.272 |
0.272 |
0.277 |
0.424 |
google/gemma-2-2b-it |
0.354 |
0.502 |
0.520 |
0.548 |
0.878 |
0.569 |
0.440 |
0.230 |
0.464 |
0.382 |
0.489 |
rinna/gemma-2-baku-2b-it |
0.342 |
0.416 |
0.511 |
0.522 |
0.871 |
0.526 |
0.027 |
0.174 |
0.063 |
0.158 |
0.361 |
google/gemma-2-2b-jpn-it |
0.370 |
0.503 |
0.532 |
0.539 |
0.879 |
0.557 |
0.351 |
0.132 |
0.451 |
0.392 |
0.471 |
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1 |
0.332 |
0.417 |
0.529 |
0.506 |
0.856 |
0.530 |
0.284 |
0.150 |
0.405 |
0.301 |
0.431 |
Qwen/Qwen2.5-3B-Instruct |
0.364 |
0.446 |
0.562 |
0.504 |
0.869 |
0.664 |
0.096 |
0.612 |
0.128 |
0.471 |
0.472 |
google/gemma-3-4b-it |
0.412 |
0.500 |
0.560 |
0.552 |
0.872 |
0.583 |
0.769 |
0.306 |
0.598 |
0.513 |
0.566 |
Qwen/Qwen2.5-7B-Instruct |
0.428 |
0.519 |
0.624 |
0.569 |
0.877 |
0.742 |
0.739 |
0.688 |
0.217 |
0.636 |
0.604 |
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 |
0.396 |
0.629 |
0.593 |
0.570 |
0.884 |
0.629 |
0.622 |
0.266 |
0.626 |
0.445 |
0.566 |
google/gemma-2-9b-it |
0.432 |
0.658 |
0.605 |
0.659 |
0.904 |
0.723 |
0.779 |
0.394 |
0.719 |
0.613 |
0.649 |
tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1 |
0.404 |
0.640 |
0.609 |
0.623 |
0.900 |
0.680 |
0.710 |
0.392 |
0.663 |
0.491 |
0.611 |
google/gemma-3-12b-it |
0.422 |
0.665 |
0.639 |
0.649 |
0.901 |
0.721 |
0.867 |
0.796 |
0.844 |
0.736 |
0.705 |
google/gemma-2-27b-it |
0.445 |
0.694 |
0.634 |
0.673 |
0.910 |
0.746 |
0.858 |
0.472 |
0.789 |
0.648 |
0.673 |
tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1 |
0.416 |
0.673 |
0.631 |
0.647 |
0.907 |
0.717 |
0.832 |
0.454 |
0.753 |
0.602 |
0.645 |
google/gemma-3-27b-it |
0.448 |
0.712 |
0.662 |
0.689 |
0.913 |
0.763 |
0.902 |
0.578 |
0.823 |
0.764 |
0.727 |
Qwen/Qwen2.5-32B-Instruct |
0.439 |
0.701 |
0.651 |
0.677 |
0.909 |
0.752 |
0.884 |
0.556 |
0.801 |
0.736 |
0.713 |
📄 License
The models in this series are released under the gemma and llama3.3 licenses. The training datasets used include tokyotech-llm/lmsys-chat-1m-synth, tokyotech-llm/swallow-magpie-ultra-v0.1, tokyotech-llm/swallow-gemma-magpie-v0.1, lmsys/lmsys-chat-1m, and argilla/magpie-ultra-v0.1.