Gemma-2-Llama-Swallow-27b-it-v0.1 Open-Source Large Model - Enhance Japanese Capabilities While Maintaining Advantages in English Capabilities

Gemma 2 Llama Swallow 27b It V0.1

Developed by tokyotech-llm

A Japanese-enhanced large language model based on the Gemma-2 architecture, significantly improving Japanese capabilities while retaining original English proficiency

Large Language Model

Transformers

Supports Multiple Languages#Japanese Enhancement #Multi-turn Dialogue #Bilingual Processing

Downloads 27

Release Time : 4/24/2025

Model Overview

This model is one of a series built through continued pretraining of Google Gemma-2, specifically optimized for Japanese processing capabilities, suitable for Japanese-English bilingual text generation and comprehension tasks

Model Features

Enhanced Bilingual Capabilities

Significantly improved Japanese processing while retaining the original Gemma 2 English capabilities

Large-scale Pretraining

Continued pretraining using approximately 200 billion tokens of mixed corpus, including specialized Japanese data

Instruction Fine-tuning Optimization

Employed supervised fine-tuning (SFT) with specially constructed synthetic data for Japanese

Model Capabilities

Japanese text generation

English text generation

Japanese-English bilingual comprehension

Multi-turn dialogue processing

Code generation

Use Cases

Language Services

Japanese Chat Assistant

Building fluent and natural Japanese dialogue systems

Excellent performance in Japanese MT-Bench evaluations

Japanese-English Translation

Achieving high-quality bidirectional translation

Competitive performance on WMT20 benchmark

Education

Japanese Learning Assistance

Helping non-native speakers learn Japanese

## 🚀 Gemma-2-Llama-Swallow

*The Gemma-2-Llama-Swallow series enhances language capabilities by continual pre-training and instruction tuning.*

The Gemma-2-Llama-Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models. Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities. We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training. The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese. See the Swallow Model Index section to find other model variants. Built with Gemma. Built with Llama.

## 🚀 Quick Start
This section provides an overview of the Gemma-2-Llama-Swallow series. For specific usage, you can refer to the official documentation of the base model [Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315).

## ✨ Features
- **Bilingual Capabilities**: Retains English language capabilities while enhancing Japanese language performance.
- **Continual Pre-training**: Uses a large amount of data for continual pre-training to improve model performance.
- **Instruction Tuning**: Instruction-tuned models are built on synthetic data for Japanese.

## 📦 Installation
No specific installation steps are provided in the original document.

## 📚 Documentation
### Release History
- **May 19, 2025**: Released [Gemma-2-Llama-Swallow-2b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1), [Gemma-2-Llama-Swallow-9b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1), [Gemma-2-Llama-Swallow-27b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1), [Gemma-2-Llama-Swallow-2b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1), [Gemma-2-Llama-Swallow-9b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1), and [Gemma-2-Llama-Swallow-27b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1).

### Swallow Model Index
| Model | gemma-2-swallow v0.1 | gemma-2-swallow-it v0.1 |
| ----- | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| 2B | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1) | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1) |
| 9B | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1) | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1) |
| 27B | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1) | [ü§ó HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1) |

![logo](./logo.png)

The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/index.en.html) provides large language models developed by the Swallow team.

### Model Details
| Property | Details |
|----------|---------|
| Model Type | Please refer to [Gemma 2 paper](https://arxiv.org/abs/2408.00118) for details on the model architecture. |
| Language(s) | Japanese, English |
| Library | [maxtext](https://github.com/AI-Hypercomputer/maxtext) |
| Tokenizer | Please refer to [Gemma 2 paper](https://arxiv.org/abs/2408.00118) for details on the tokenizer. |
| Contact | swallow[at]nlp.c.titech.ac.jp |

### Model Performance
#### MT-Bench JA
| Model | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | JMT Avg |
| --------------------------------------------------- | ------ | ---------- | ---------- | ----- | --------- | -------- | ----- | ------- | ------- |
| google/gemma-3-1b-it | 0.379 | 0.497 | 0.680 | 0.385 | 0.322 | 0.628 | 0.540 | 0.651 | 0.510 |
| Qwen/Qwen2.5-1.5B-Instruct | 0.408 | 0.513 | 0.456 | 0.527 | 0.352 | 0.473 | 0.406 | 0.469 | 0.450 |
| google/gemma-2-2b-it | 0.454 | 0.587 | 0.693 | 0.524 | 0.445 | 0.654 | 0.567 | 0.630 | 0.569 |
| rinna/gemma-2-baku-2b-it | 0.470 | 0.625 | 0.810 | 0.414 | 0.382 | 0.713 | 0.609 | 0.697 | 0.590 |
| google/gemma-2-2b-jpn-it | 0.467 | 0.488 | 0.741 | 0.379 | 0.406 | 0.660 | 0.589 | 0.672 | 0.550 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1** | 0.438 | 0.533 | 0.781 | 0.557 | 0.404 | 0.706 | 0.674 | 0.682 | 0.597 |
| Qwen/Qwen2.5-3B-Instruct | 0.567 | 0.647 | 0.597 | 0.665 | 0.457 | 0.649 | 0.526 | 0.637 | 0.593 |
| google/gemma-3-4b-it | 0.603 | 0.724 | 0.798 | 0.767 | 0.498 | 0.803 | 0.775 | 0.822 | 0.724 |
| Qwen/Qwen2.5-7B-Instruct | 0.599 | 0.741 | 0.719 | 0.637 | 0.541 | 0.744 | 0.624 | 0.713 | 0.665 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 | 0.562 | 0.756 | 0.869 | 0.610 | 0.512 | 0.783 | 0.748 | 0.803 | 0.705 |
| google/gemma-2-9b-it | 0.652 | 0.765 | 0.857 | 0.614 | 0.673 | 0.811 | 0.713 | 0.800 | 0.736 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1** | 0.592 | 0.796 | 0.872 | 0.742 | 0.638 | 0.802 | 0.745 | 0.803 | 0.749 |
| google/gemma-3-12b-it | 0.807 | 0.814 | 0.871 | 0.886 | 0.623 | 0.847 | 0.858 | 0.863 | 0.821 |
| google/gemma-2-27b-it | 0.727 | 0.809 | 0.874 | 0.719 | 0.639 | 0.810 | 0.740 | 0.826 | 0.768 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1** | 0.618 | 0.839 | 0.873 | 0.741 | 0.608 | 0.814 | 0.739 | 0.836 | 0.759 |
| google/gemma-3-27b-it | 0.804 | 0.927 | 0.879 | 0.876 | 0.774 | 0.846 | 0.848 | 0.882 | 0.855 |
| Qwen/Qwen2.5-32B-Instruct | 0.724 | 0.885 | 0.816 | 0.918 | 0.726 | 0.834 | 0.763 | 0.808 | 0.809 |

#### Japanese tasks
| Model | JCom. | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20-en-ja | WMT20-ja-en | JMMLU | JHumanEval | Ja Avg |
| --------------------------------------------------- | ------ | -------- | ------- | ------- | ------- | ------ | ----------- | ----------- | ------ | ---------- | ------ |
| | 4-shot | 4-shot | 4-shot | 4-shot | 1-shot | 4-shot | 4-shot | 4-shot | 5-shot | 0-shot | |
| | EM acc | Char-F1 | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU | BLEU | EM acc | pass@1 | |
| google/gemma-3-1b-it | 0.526 | 0.330 | 0.237 | 0.700 | 0.113 | 0.088 | 0.166 | 0.115 | 0.332 | 0.245 | 0.285 |
| Qwen/Qwen2.5-1.5B-Instruct | 0.812 | 0.276 | 0.241 | 0.847 | 0.128 | 0.292 | 0.147 | 0.119 | 0.447 | 0.242 | 0.355 |
| google/gemma-2-2b-it | 0.862 | 0.348 | 0.315 | 0.879 | 0.117 | 0.252 | 0.207 | 0.183 | 0.437 | 0.321 | 0.392 |
| rinna/gemma-2-baku-2b-it | 0.855 | 0.228 | 0.390 | 0.877 | 0.115 | 0.172 | 0.255 | 0.190 | 0.415 | 0.165 | 0.366 |
| google/gemma-2-2b-jpn-it | 0.845 | 0.321 | 0.291 | 0.877 | 0.131 | 0.192 | 0.204 | 0.180 | 0.418 | 0.311 | 0.377 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1** | 0.862 | 0.367 | 0.483 | 0.881 | 0.145 | 0.288 | 0.258 | 0.200 | 0.485 | 0.267 | 0.424 |
| Qwen/Qwen2.5-3B-Instruct | 0.876 | 0.304 | 0.293 | 0.866 | 0.144 | 0.228 | 0.198 | 0.168 | 0.536 | 0.474 | 0.409 |
| google/gemma-3-4b-it | 0.818 | 0.444 | 0.404 | 0.801 | 0.134 | 0.332 | 0.217 | 0.169 | 0.477 | 0.365 | 0.416 |
| Qwen/Qwen2.5-7B-Instruct | 0.915 | 0.429 | 0.391 | 0.891 | 0.168 | 0.632 | 0.211 | 0.192 | 0.623 | 0.532 | 0.498 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 | 0.924 | 0.528 | 0.583 | 0.896 | 0.191 | 0.532 | 0.281 | 0.229 | 0.544 | 0.394 | 0.510 |
| google/gemma-2-9b-it | 0.931 | 0.532 | 0.527 | 0.876 | 0.149 | 0.636 | 0.273 | 0.239 | 0.623 | 0.559 | 0.535 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1** | 0.946 | 0.606 | 0.643 | 0.852 | 0.170 | 0.624 | 0.296 | 0.238 | 0.639 | 0.446 | 0.546 |
| google/gemma-3-12b-it | 0.935 | 0.566 | 0.542 | 0.808 | 0.148 | 0.724 | 0.289 | 0.239 | 0.645 | 0.637 | 0.553 |
| google/gemma-2-27b-it | 0.956 | 0.541 | 0.576 | 0.883 | 0.166 | 0.704 | 0.290 | 0.249 | 0.670 | 0.638 | 0.567 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1** | 0.969 | 0.654 | 0.658 | 0.891 | 0.194 | 0.764 | 0.316 | 0.258 | 0.686 | 0.635 | 0.602 |
| google/gemma-3-27b-it | 0.946 | 0.592 | 0.584 | 0.867 | 0.142 | 0.764 | 0.307 | 0.253 | 0.716 | 0.736 | 0.591 |
| Qwen/Qwen2.5-32B-Instruct | 0.959 | 0.567 | 0.497 | 0.903 | 0.169 | 0.780 | 0.228 | 0.195 | 0.757 | 0.651 | 0.571 |

#### English tasks
| Model | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO | MMLU | GSM8K | MATH | BBH | HumanEval | En Avg |
| --------------------------------------------------- | ---------- | -------- | --------- | -------- | ------ | ------ | ------ | ---------- | ---------- | --------- | ------ |
| | 4-shot | 4-shot | 4-shot | 4-shot | 4-shot | 5-shot | 4-shot | 4-shot | 3-shot | 0-shot | |
| | Acc | EM acc | Acc | EM acc | Acc | Acc | EM acc | CoT EM Acc | CoT EM Acc | pass@1 | |
| google/gemma-3-1b-it | 0.272 | 0.229 | 0.421 | 0.501 | 0.786 | 0.398 | 0.256 | 0.340 | 0.379 | 0.335 | 0.392 |
| Qwen/Qwen2.5-1.5B-Instruct | 0.334 | 0.378 | 0.503 | 0.501 | 0.844 | 0.604 | 0.257 | 0.272 | 0.272 | 0.277 | 0.424 |
| google/gemma-2-2b-it | 0.354 | 0.502 | 0.520 | 0.548 | 0.878 | 0.569 | 0.440 | 0.230 | 0.464 | 0.382 | 0.489 |
| rinna/gemma-2-baku-2b-it | 0.342 | 0.416 | 0.511 | 0.522 | 0.871 | 0.526 | 0.027 | 0.174 | 0.063 | 0.158 | 0.361 |
| google/gemma-2-2b-jpn-it | 0.370 | 0.503 | 0.532 | 0.539 | 0.879 | 0.557 | 0.351 | 0.132 | 0.451 | 0.392 | 0.471 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1** | 0.332 | 0.417 | 0.529 | 0.506 | 0.856 | 0.530 | 0.284 | 0.150 | 0.405 | 0.301 | 0.431 |
| Qwen/Qwen2.5-3B-Instruct | 0.364 | 0.446 | 0.562 | 0.504 | 0.869 | 0.664 | 0.096 | 0.612 | 0.128 | 0.471 | 0.472 |
| google/gemma-3-4b-it | 0.412 | 0.500 | 0.560 | 0.552 | 0.872 | 0.583 | 0.769 | 0.306 | 0.598 | 0.513 | 0.566 |
| Qwen/Qwen2.5-7B-Instruct | 0.428 | 0.519 | 0.624 | 0.569 | 0.877 | 0.742 | 0.739 | 0.688 | 0.217 | 0.636 | 0.604 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 | 0.396 | 0.629 | 0.593 | 0.570 | 0.884 | 0.629 | 0.622 | 0.266 | 0.626 | 0.445 | 0.566 |
| google/gemma-2-9b-it | 0.432 | 0.658 | 0.605 | 0.659 | 0.904 | 0.723 | 0.779 | 0.394 | 0.719 | 0.613 | 0.649 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1** | 0.404 | 0.640 | 0.609 | 0.623 | 0.900 | 0.680 | 0.710 | 0.392 | 0.663 | 0.491 | 0.611 |
| google/gemma-3-12b-it | 0.422 | 0.665 | 0.639 | 0.649 | 0.901 | 0.721 | 0.867 | 0.796 | 0.808 | 0.736 | 0.677 |
| google/gemma-2-27b-it | 0.434 | 0.678 | 0.653 | 0.663 | 0.905 | 0.732 | 0.872 | 0.812 | 0.823 | 0.742 | 0.690 |
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1** | 0.406 | 0.656 | 0.647 | 0.634 | 0.902 | 0.696 | 0.804 | 0.788 | 0.793 | 0.687 | 0.662 |
| google/gemma-3-27b-it | 0.438 | 0.684 | 0.667 | 0.671 | 0.907 | 0.740 | 0.884 | 0.826 | 0.837 | 0.751 | 0.703 |
| Qwen/Qwen2.5-32B-Instruct | 0.442 | 0.692 | 0.675 | 0.679 | 0.909 | 0.748 | 0.890 | 0.834 | 0.845 | 0.759 | 0.711 |

## 📄 License
The models in this series are released under the [gemma](https://example.com/gemma_license) and [llama3.3](https://example.com/llama3.3_license) licenses.

This README has been beautified according to the requirements, including adding emojis, optimizing the structure, and keeping the code and links unchanged. The content is organized into different sections for better readability.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご