Gemma 2 Llama Swallow 27b It V0.1
A Japanese-enhanced large language model based on the Gemma-2 architecture, significantly improving Japanese capabilities while retaining original English proficiency
Large Language Model
Transformers Supports Multiple Languages#Japanese Enhancement#Multi-turn Dialogue#Bilingual Processing

Downloads 27
Release Time : 4/24/2025
Model Overview
This model is one of a series built through continued pretraining of Google Gemma-2, specifically optimized for Japanese processing capabilities, suitable for Japanese-English bilingual text generation and comprehension tasks
Model Features
Enhanced Bilingual Capabilities
Significantly improved Japanese processing while retaining the original Gemma 2 English capabilities
Large-scale Pretraining
Continued pretraining using approximately 200 billion tokens of mixed corpus, including specialized Japanese data
Instruction Fine-tuning Optimization
Employed supervised fine-tuning (SFT) with specially constructed synthetic data for Japanese
Model Capabilities
Japanese text generation
English text generation
Japanese-English bilingual comprehension
Multi-turn dialogue processing
Code generation
Use Cases
Language Services
Japanese Chat Assistant
Building fluent and natural Japanese dialogue systems
Excellent performance in Japanese MT-Bench evaluations
Japanese-English Translation
Achieving high-quality bidirectional translation
Competitive performance on WMT20 benchmark
Education
Japanese Learning Assistance
Helping non-native speakers learn Japanese
## 🚀 Gemma-2-Llama-Swallow
*The Gemma-2-Llama-Swallow series enhances language capabilities by continual pre-training and instruction tuning.*
The Gemma-2-Llama-Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models. Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities. We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training. The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese. See the Swallow Model Index section to find other model variants. Built with Gemma. Built with Llama.
## 🚀 Quick Start
This section provides an overview of the Gemma-2-Llama-Swallow series. For specific usage, you can refer to the official documentation of the base model [Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315).
## ✨ Features
- **Bilingual Capabilities**: Retains English language capabilities while enhancing Japanese language performance.
- **Continual Pre-training**: Uses a large amount of data for continual pre-training to improve model performance.
- **Instruction Tuning**: Instruction-tuned models are built on synthetic data for Japanese.
## 📦 Installation
No specific installation steps are provided in the original document.
## 📚 Documentation
### Release History
- **May 19, 2025**: Released [Gemma-2-Llama-Swallow-2b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1), [Gemma-2-Llama-Swallow-9b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1), [Gemma-2-Llama-Swallow-27b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1), [Gemma-2-Llama-Swallow-2b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1), [Gemma-2-Llama-Swallow-9b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1), and [Gemma-2-Llama-Swallow-27b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1).
### Swallow Model Index
| Model | gemma-2-swallow v0.1 | gemma-2-swallow-it v0.1 |
| ----- | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| 2B | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1) | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1) |
| 9B | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1) | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1) |
| 27B | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1) | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1) |

The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/index.en.html) provides large language models developed by the Swallow team.
### Model Details
| Property | Details |
|----------|---------|
| Model Type | Please refer to [Gemma 2 paper](https://arxiv.org/abs/2408.00118) for details on the model architecture. |
| Language(s) | Japanese, English |
| Library | [maxtext](https://github.com/AI-Hypercomputer/maxtext) |
| Tokenizer | Please refer to [Gemma 2 paper](https://arxiv.org/abs/2408.00118) for details on the tokenizer. |
| Contact | swallow[at]nlp.c.titech.ac.jp |
### Model Performance
#### MT-Bench JA
| Model | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | JMT Avg |
| --------------------------------------------------- | ------ | ---------- | ---------- | ----- | --------- | -------- | ----- | ------- | ------- |
| google/gemma-3-1b-it | 0.379 | 0.497 | 0.680 | 0.385 | 0.322 | 0.628 | 0.540 | 0.651 | 0.510 |
| Qwen/Qwen2.5-1.5B-Instruct | 0.408 | 0.513 | 0.456 | 0.527 | 0.352 | 0.473 | 0.406 | 0.469 | 0.450 |
| google/gemma-2-2b-it | 0.454 | 0.587 | 0.693 | 0.524 | 0.445 | 0.654 | 0.567 | 0.630 | 0.569 |
| rinna/gemma-2-baku-2b-it | 0.470 | 0.625 | 0.810 | 0.414 | 0.382 | 0.713 | 0.609 | 0.697 | 0.590 |
| google/gemma-2-2b-jpn-it | 0.467 | 0.488 | 0.741 | 0.379 | 0.406 | 0.660 | 0.589 | 0.672 | 0.550 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1** | 0.438 | 0.533 | 0.781 | 0.557 | 0.404 | 0.706 | 0.674 | 0.682 | 0.597 |
| Qwen/Qwen2.5-3B-Instruct | 0.567 | 0.647 | 0.597 | 0.665 | 0.457 | 0.649 | 0.526 | 0.637 | 0.593 |
| google/gemma-3-4b-it | 0.603 | 0.724 | 0.798 | 0.767 | 0.498 | 0.803 | 0.775 | 0.822 | 0.724 |
| Qwen/Qwen2.5-7B-Instruct | 0.599 | 0.741 | 0.719 | 0.637 | 0.541 | 0.744 | 0.624 | 0.713 | 0.665 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 | 0.562 | 0.756 | 0.869 | 0.610 | 0.512 | 0.783 | 0.748 | 0.803 | 0.705 |
| google/gemma-2-9b-it | 0.652 | 0.765 | 0.857 | 0.614 | 0.673 | 0.811 | 0.713 | 0.800 | 0.736 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1** | 0.592 | 0.796 | 0.872 | 0.742 | 0.638 | 0.802 | 0.745 | 0.803 | 0.749 |
| google/gemma-3-12b-it | 0.807 | 0.814 | 0.871 | 0.886 | 0.623 | 0.847 | 0.858 | 0.863 | 0.821 |
| google/gemma-2-27b-it | 0.727 | 0.809 | 0.874 | 0.719 | 0.639 | 0.810 | 0.740 | 0.826 | 0.768 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1** | 0.618 | 0.839 | 0.873 | 0.741 | 0.608 | 0.814 | 0.739 | 0.836 | 0.759 |
| google/gemma-3-27b-it | 0.804 | 0.927 | 0.879 | 0.876 | 0.774 | 0.846 | 0.848 | 0.882 | 0.855 |
| Qwen/Qwen2.5-32B-Instruct | 0.724 | 0.885 | 0.816 | 0.918 | 0.726 | 0.834 | 0.763 | 0.808 | 0.809 |
#### Japanese tasks
| Model | JCom. | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20-en-ja | WMT20-ja-en | JMMLU | JHumanEval | Ja Avg |
| --------------------------------------------------- | ------ | -------- | ------- | ------- | ------- | ------ | ----------- | ----------- | ------ | ---------- | ------ |
| | 4-shot | 4-shot | 4-shot | 4-shot | 1-shot | 4-shot | 4-shot | 4-shot | 5-shot | 0-shot | |
| | EM acc | Char-F1 | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU | BLEU | EM acc | pass@1 | |
| google/gemma-3-1b-it | 0.526 | 0.330 | 0.237 | 0.700 | 0.113 | 0.088 | 0.166 | 0.115 | 0.332 | 0.245 | 0.285 |
| Qwen/Qwen2.5-1.5B-Instruct | 0.812 | 0.276 | 0.241 | 0.847 | 0.128 | 0.292 | 0.147 | 0.119 | 0.447 | 0.242 | 0.355 |
| google/gemma-2-2b-it | 0.862 | 0.348 | 0.315 | 0.879 | 0.117 | 0.252 | 0.207 | 0.183 | 0.437 | 0.321 | 0.392 |
| rinna/gemma-2-baku-2b-it | 0.855 | 0.228 | 0.390 | 0.877 | 0.115 | 0.172 | 0.255 | 0.190 | 0.415 | 0.165 | 0.366 |
| google/gemma-2-2b-jpn-it | 0.845 | 0.321 | 0.291 | 0.877 | 0.131 | 0.192 | 0.204 | 0.180 | 0.418 | 0.311 | 0.377 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1** | 0.862 | 0.367 | 0.483 | 0.881 | 0.145 | 0.288 | 0.258 | 0.200 | 0.485 | 0.267 | 0.424 |
| Qwen/Qwen2.5-3B-Instruct | 0.876 | 0.304 | 0.293 | 0.866 | 0.144 | 0.228 | 0.198 | 0.168 | 0.536 | 0.474 | 0.409 |
| google/gemma-3-4b-it | 0.818 | 0.444 | 0.404 | 0.801 | 0.134 | 0.332 | 0.217 | 0.169 | 0.477 | 0.365 | 0.416 |
| Qwen/Qwen2.5-7B-Instruct | 0.915 | 0.429 | 0.391 | 0.891 | 0.168 | 0.632 | 0.211 | 0.192 | 0.623 | 0.532 | 0.498 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 | 0.924 | 0.528 | 0.583 | 0.896 | 0.191 | 0.532 | 0.281 | 0.229 | 0.544 | 0.394 | 0.510 |
| google/gemma-2-9b-it | 0.931 | 0.532 | 0.527 | 0.876 | 0.149 | 0.636 | 0.273 | 0.239 | 0.623 | 0.559 | 0.535 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1** | 0.946 | 0.606 | 0.643 | 0.852 | 0.170 | 0.624 | 0.296 | 0.238 | 0.639 | 0.446 | 0.546 |
| google/gemma-3-12b-it | 0.935 | 0.566 | 0.542 | 0.808 | 0.148 | 0.724 | 0.289 | 0.239 | 0.645 | 0.637 | 0.553 |
| google/gemma-2-27b-it | 0.956 | 0.541 | 0.576 | 0.883 | 0.166 | 0.704 | 0.290 | 0.249 | 0.670 | 0.638 | 0.567 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1** | 0.969 | 0.654 | 0.658 | 0.891 | 0.194 | 0.764 | 0.316 | 0.258 | 0.686 | 0.635 | 0.602 |
| google/gemma-3-27b-it | 0.946 | 0.592 | 0.584 | 0.867 | 0.142 | 0.764 | 0.307 | 0.253 | 0.716 | 0.736 | 0.591 |
| Qwen/Qwen2.5-32B-Instruct | 0.959 | 0.567 | 0.497 | 0.903 | 0.169 | 0.780 | 0.228 | 0.195 | 0.757 | 0.651 | 0.571 |
#### English tasks
| Model | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO | MMLU | GSM8K | MATH | BBH | HumanEval | En Avg |
| --------------------------------------------------- | ---------- | -------- | --------- | -------- | ------ | ------ | ------ | ---------- | ---------- | --------- | ------ |
| | 4-shot | 4-shot | 4-shot | 4-shot | 4-shot | 5-shot | 4-shot | 4-shot | 3-shot | 0-shot | |
| | Acc | EM acc | Acc | EM acc | Acc | Acc | EM acc | CoT EM Acc | CoT EM Acc | pass@1 | |
| google/gemma-3-1b-it | 0.272 | 0.229 | 0.421 | 0.501 | 0.786 | 0.398 | 0.256 | 0.340 | 0.379 | 0.335 | 0.392 |
| Qwen/Qwen2.5-1.5B-Instruct | 0.334 | 0.378 | 0.503 | 0.501 | 0.844 | 0.604 | 0.257 | 0.272 | 0.272 | 0.277 | 0.424 |
| google/gemma-2-2b-it | 0.354 | 0.502 | 0.520 | 0.548 | 0.878 | 0.569 | 0.440 | 0.230 | 0.464 | 0.382 | 0.489 |
| rinna/gemma-2-baku-2b-it | 0.342 | 0.416 | 0.511 | 0.522 | 0.871 | 0.526 | 0.027 | 0.174 | 0.063 | 0.158 | 0.361 |
| google/gemma-2-2b-jpn-it | 0.370 | 0.503 | 0.532 | 0.539 | 0.879 | 0.557 | 0.351 | 0.132 | 0.451 | 0.392 | 0.471 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1** | 0.332 | 0.417 | 0.529 | 0.506 | 0.856 | 0.530 | 0.284 | 0.150 | 0.405 | 0.301 | 0.431 |
| Qwen/Qwen2.5-3B-Instruct | 0.364 | 0.446 | 0.562 | 0.504 | 0.869 | 0.664 | 0.096 | 0.612 | 0.128 | 0.471 | 0.472 |
| google/gemma-3-4b-it | 0.412 | 0.500 | 0.560 | 0.552 | 0.872 | 0.583 | 0.769 | 0.306 | 0.598 | 0.513 | 0.566 |
| Qwen/Qwen2.5-7B-Instruct | 0.428 | 0.519 | 0.624 | 0.569 | 0.877 | 0.742 | 0.739 | 0.688 | 0.217 | 0.636 | 0.604 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 | 0.396 | 0.629 | 0.593 | 0.570 | 0.884 | 0.629 | 0.622 | 0.266 | 0.626 | 0.445 | 0.566 |
| google/gemma-2-9b-it | 0.432 | 0.658 | 0.605 | 0.659 | 0.904 | 0.723 | 0.779 | 0.394 | 0.719 | 0.613 | 0.649 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1** | 0.404 | 0.640 | 0.609 | 0.623 | 0.900 | 0.680 | 0.710 | 0.392 | 0.663 | 0.491 | 0.611 |
| google/gemma-3-12b-it | 0.422 | 0.665 | 0.639 | 0.649 | 0.901 | 0.721 | 0.867 | 0.796 | 0.808 | 0.736 | 0.677 |
| google/gemma-2-27b-it | 0.434 | 0.678 | 0.653 | 0.663 | 0.905 | 0.732 | 0.872 | 0.812 | 0.823 | 0.742 | 0.690 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1** | 0.406 | 0.656 | 0.647 | 0.634 | 0.902 | 0.696 | 0.804 | 0.788 | 0.793 | 0.687 | 0.662 |
| google/gemma-3-27b-it | 0.438 | 0.684 | 0.667 | 0.671 | 0.907 | 0.740 | 0.884 | 0.826 | 0.837 | 0.751 | 0.703 |
| Qwen/Qwen2.5-32B-Instruct | 0.442 | 0.692 | 0.675 | 0.679 | 0.909 | 0.748 | 0.890 | 0.834 | 0.845 | 0.759 | 0.711 |
## 📄 License
The models in this series are released under the [gemma](https://example.com/gemma_license) and [llama3.3](https://example.com/llama3.3_license) licenses.
This README has been beautified according to the requirements, including adding emojis, optimizing the structure, and keeping the code and links unchanged. The content is organized into different sections for better readability.
Phi 2 GGUF
Other
Phi-2 is a small yet powerful language model developed by Microsoft, featuring 2.7 billion parameters, focusing on efficient inference and high-quality text generation.
Large Language Model Supports Multiple Languages
P
TheBloke
41.5M
205
Roberta Large
MIT
A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods
Large Language Model English
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.
Large Language Model English
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.
Large Language Model English
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
9.6M
664
Roberta Base
MIT
An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning
Large Language Model English
R
FacebookAI
9.3M
488
Opt 125m
Other
OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.
Large Language Model English
O
facebook
6.3M
198
1
A pretrained model based on the transformers library, suitable for various NLP tasks
Large Language Model
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.
Large Language Model Supports Multiple Languages
T
google-t5
5.4M
702
Featured Recommended AI Models