Gemma-2-Llama-Swallow-9b-it-v0.1 Open-source Multilingual Large Language Model

Gemma 2 Llama Swallow 9b It V0.1

Developed by tokyotech-llm

The Gemma-2-Llama-Swallow series of models are multilingual large models constructed through continuous pre-training based on Gemma-2, with a particular enhancement in Japanese ability.

Large Language Model

Transformers

Supports Multiple Languages#Japanese enhancement #Multilingual mixed training #Continuous pre-training optimization

Downloads 2,491

Release Time : 4/23/2025

Model Overview

While retaining English ability, this model significantly improves Japanese processing ability through continuous pre-training with approximately 200 billion tokens, and is suitable for multilingual tasks and Japanese instruction tasks.

Model Features

Enhanced multilingual ability

On the basis of retaining the original English ability, the Japanese processing ability is significantly improved.

Large-scale continuous pre-training

Continuous pre-training is carried out using approximately 200 billion token data, including Japanese web corpora, Wikipedia, etc.

Instruction fine-tuning optimization

Supervised fine-tuning is carried out on specially constructed Japanese synthetic data to improve the performance of instruction tasks.

Model Capabilities

Japanese text generation

English text generation

Multi-round dialogue

Machine translation

Mathematical reasoning

Code generation

Use Cases

Language processing

Japanese dialogue system

Build a Japanese intelligent assistant

Scored 0.759 in the Japanese MT-Bench

Multilingual content generation

Generate Japanese and English content

Education

Japanese learning assistance

Help learners practice Japanese

🚀 Gemma-2-Llama-Swallow

The Gemma-2-Llama-Swallow series enhances language capabilities by continual pre - training on Gemma 2 models, excelling in both English and Japanese.

🚀 Quick Start

To get started with the Gemma-2-Llama-Swallow series, first install the necessary library:

pip install vllm

Then, you can use the following Python code to generate text:

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(
    model=model_name,
    tensor_parallel_size=1,
)

sampling_params = SamplingParams(
    temperature=0.6, top_p=0.9, max_tokens=512,
)


message = [
    {
        "role": "user",
        "content": "Êó•Êú¨„ÅÆÊò•„Åã„ÇâÂ§è„ÅÆÁßª„ÇäÂ§â„Çè„Çä„Å´„Å§„ÅÑ„Å¶Êïô„Åà„Å¶„Åè„Å†„Åï„ÅÑ",
    },
]
prompt = tokenizer.apply_chat_template(
    message, tokenize=False, add_generation_prompt=True
)

output = llm.generate(prompt, sampling_params)

print(output[0].outputs[0].text)

✨ Features

Enhanced Language Capabilities: Gemma 2 Swallow retains English language capabilities while significantly enhancing Japanese language capabilities.
Continual Pre - training: Built by continual pre - training on the gemma - 2 models using approximately 200 billion tokens from diverse sources.
Instruction - Tuned Models: Instruction - tuned models are built by supervised fine - tuning (SFT) on synthetic data specially built for Japanese.

📦 Installation

pip install vllm

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(
    model=model_name,
    tensor_parallel_size=1,
)

sampling_params = SamplingParams(
    temperature=0.6, top_p=0.9, max_tokens=512,
)


message = [
    {
        "role": "user",
        "content": "Êó•Êú¨„ÅÆÊò•„Åã„ÇâÂ§è„ÅÆÁßª„ÇäÂ§â„Çè„Çä„Å´„Å§„ÅÑ„Å¶Êïô„Åà„Å¶„Åè„Å†„Åï„ÅÑ",
    },
]
prompt = tokenizer.apply_chat_template(
    message, tokenize=False, add_generation_prompt=True
)

output = llm.generate(prompt, sampling_params)

print(output[0].outputs[0].text)

📚 Documentation

Release History

May 19, 2025: Released Gemma-2-Llama-Swallow-2b-pt-v0.1, Gemma-2-Llama-Swallow-9b-pt-v0.1, Gemma-2-Llama-Swallow-27b-pt-v0.1, Gemma-2-Llama-Swallow-2b-it-v0.1, Gemma-2-Llama-Swallow-9b-it-v0.1, and Gemma-2-Llama-Swallow-27b-it-v0.1.

Swallow Model Index

Model	gemma-2-swallow v0.1	gemma-2-swallow-it v0.1
2B	ü§ó HuggingFace	ü§ó HuggingFace
9B	ü§ó HuggingFace	ü§ó HuggingFace
27B	ü§ó HuggingFace	ü§ó HuggingFace

The website https://swallow-llm.github.io/ provides large language models developed by the Swallow team.

Model Details

Property	Details
Model Type	Please refer to Gemma 2 paper for details on the model architecture.
Language(s)	Japanese, English
Library	maxtext
Tokenizer	Please refer to Gemma 2 paper for details on the tokenizer.
Contact	swallow[at]nlp.c.titech.ac.jp

Model Performance

MT - Bench JA

Model	coding	extraction	humanities	math	reasoning	roleplay	stem	writing	JMT Avg
google/gemma-3-1b-it	0.379	0.497	0.680	0.385	0.322	0.628	0.540	0.651	0.510
Qwen/Qwen2.5-1.5B-Instruct	0.408	0.513	0.456	0.527	0.352	0.473	0.406	0.469	0.450
google/gemma-2-2b-it	0.454	0.587	0.693	0.524	0.445	0.654	0.567	0.630	0.569
rinna/gemma-2-baku-2b-it	0.470	0.625	0.810	0.414	0.382	0.713	0.609	0.697	0.590
google/gemma-2-2b-jpn-it	0.467	0.488	0.741	0.379	0.406	0.660	0.589	0.672	0.550
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1	0.438	0.533	0.781	0.557	0.404	0.706	0.674	0.682	0.597
Qwen/Qwen2.5-3B-Instruct	0.567	0.647	0.597	0.665	0.457	0.649	0.526	0.637	0.593
google/gemma-3-4b-it	0.603	0.724	0.798	0.767	0.498	0.803	0.775	0.822	0.724
Qwen/Qwen2.5-7B-Instruct	0.599	0.741	0.719	0.637	0.541	0.744	0.624	0.713	0.665
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3	0.562	0.756	0.869	0.610	0.512	0.783	0.748	0.803	0.705
google/gemma-2-9b-it	0.652	0.765	0.857	0.614	0.673	0.811	0.713	0.800	0.736
tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1	0.592	0.796	0.872	0.742	0.638	0.802	0.745	0.803	0.749
google/gemma-3-12b-it	0.807	0.814	0.871	0.886	0.623	0.847	0.858	0.863	0.821
google/gemma-2-27b-it	0.727	0.809	0.874	0.719	0.639	0.810	0.740	0.826	0.768
tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1	0.618	0.839	0.873	0.741	0.608	0.814	0.739	0.836	0.759
google/gemma-3-27b-it	0.804	0.927	0.879	0.876	0.774	0.846	0.848	0.882	0.855
Qwen/Qwen2.5-32B-Instruct	0.724	0.885	0.816	0.918	0.726	0.834	0.763	0.808	0.809

Japanese tasks

Model	JCom.	JEMHopQA	NIILC	JSQuAD	XL-Sum	MGSM	WMT20-en-ja	WMT20-ja-en	JMMLU	JHumanEval	Ja Avg
	4-shot	4-shot	4-shot	4-shot	1-shot	4-shot	4-shot	4-shot	5-shot	0-shot
	EM acc	Char-F1	Char-F1	Char-F1	ROUGE-2	EM acc	BLEU	BLEU	EM acc	pass@1
google/gemma-3-1b-it	0.526	0.330	0.237	0.700	0.113	0.088	0.166	0.115	0.332	0.245	0.285
Qwen/Qwen2.5-1.5B-Instruct	0.812	0.276	0.241	0.847	0.128	0.292	0.147	0.119	0.447	0.242	0.355
google/gemma-2-2b-it	0.862	0.348	0.315	0.879	0.117	0.252	0.207	0.183	0.437	0.321	0.392
rinna/gemma-2-baku-2b-it	0.855	0.228	0.390	0.877	0.115	0.172	0.255	0.190	0.415	0.165	0.366
google/gemma-2-2b-jpn-it	0.845	0.321	0.291	0.877	0.131	0.192	0.204	0.180	0.418	0.311	0.377
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1	0.862	0.367	0.483	0.881	0.145	0.288	0.258	0.200	0.485	0.267	0.424
Qwen/Qwen2.5-3B-Instruct	0.876	0.304	0.293	0.866	0.144	0.228	0.198	0.168	0.536	0.474	0.409
google/gemma-3-4b-it	0.818	0.444	0.404	0.801	0.134	0.332	0.217	0.169	0.477	0.365	0.416
Qwen/Qwen2.5-7B-Instruct	0.915	0.429	0.391	0.891	0.168	0.632	0.211	0.192	0.623	0.532	0.498
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3	0.924	0.528	0.583	0.896	0.191	0.532	0.281	0.229	0.544	0.394	0.510
google/gemma-2-9b-it	0.931	0.532	0.527	0.876	0.149	0.636	0.273	0.239	0.623	0.559	0.535
tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1	0.946	0.606	0.643	0.852	0.170	0.624	0.296	0.238	0.639	0.446	0.546
google/gemma-3-12b-it	0.935	0.566	0.542	0.808	0.148	0.724	0.289	0.239	0.645	0.637	0.553
google/gemma-2-27b-it	0.956	0.541	0.576	0.883	0.166	0.704	0.290	0.249	0.670	0.638	0.567
tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1	0.969	0.654	0.658	0.891	0.194	0.764	0.316	0.258	0.686	0.635	0.602
google/gemma-3-27b-it	0.946	0.592	0.584	0.867	0.142	0.764	0.307	0.253	0.716	0.736	0.591
Qwen/Qwen2.5-32B-Instruct	0.959	0.567	0.497	0.903	0.169	0.780	0.228	0.195	0.757	0.651	0.571

English tasks

Model	OpenBookQA	TriviaQA	HellaSWAG	SQuAD2.0	XWINO	MMLU	GSM8K	MATH	BBH	HumanEval	En Avg
	4-shot	4-shot	4-shot	4-shot	4-shot	5-shot	4-shot	4-shot	3-shot	0-shot
	Acc	EM acc	Acc	EM acc	Acc	Acc	EM acc	CoT EM Acc	CoT EM Acc	pass@1
google/gemma-3-1b-it	0.272	0.229	0.421	0.501	0.786	0.398	0.256	0.340	0.379	0.335	0.392
Qwen/Qwen2.5-1.5B-Instruct	0.334	0.378	0.503	0.501	0.844	0.604	0.257	0.272	0.272	0.277	0.424
google/gemma-2-2b-it	0.354	0.502	0.520	0.548	0.878	0.569	0.440	0.230	0.464	0.382	0.489
rinna/gemma-2-baku-2b-it	0.342	0.416	0.511	0.522	0.871	0.526	0.027	0.174	0.063	0.158	0.361
google/gemma-2-2b-jpn-it	0.370	0.503	0.532	0.539	0.879	0.557	0.351	0.132	0.451	0.392	0.471
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1	0.332	0.417	0.529	0.506	0.856	0.530	0.284	0.150	0.405	0.301	0.431
Qwen/Qwen2.5-3B-Instruct	0.364	0.446	0.562	0.504	0.869	0.664	0.096	0.612	0.128	0.471	0.472
google/gemma-3-4b-it	0.412	0.500	0.560	0.552	0.872	0.583	0.769	0.306	0.598	0.513	0.566
Qwen/Qwen2.5-7B-Instruct	0.428	0.519	0.624	0.569	0.877	0.742	0.739	0.688	0.217	0.636	0.604
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3	0.396	0.629	0.593	0.570	0.884	0.629	0.622	0.266	0.626	0.445	0.566
google/gemma-2-9b-it	0.432	0.658	0.605	0.659	0.904	0.723	0.779	0.394	0.719	0.613	0.649
tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1	0.404	0.640	0.609	0.623	0.900	0.680	0.710	0.392	0.663	0.491	0.611
google/gemma-3-12b-it	0.422	0.665	0.639	0.649	0.901	0.721	0.867	0.796	0.802	0.712	0.717
google/gemma-2-27b-it	0.458	0.766	0.655	0.669	0.909	0.762	0.851	0.466	0.790	0.707	0.703
tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1	0.424	0.747	0.663	0.664	0.911	0.749	0.821	0.442	0.772	0.682	0.687
google/gemma-3-27b-it	0.418	0.744	0.661	0.687	0.906	0.774	0.916	0.852	0.793	0.829	0.758
Qwen/Qwen2.5-32B-Instruct	0.424	0.534	0.671	0.536	0.893	0.834	0.581	0.802	0.017	0.589	0.588

Evaluation Benchmarks

The evaluation script can be found at swallow-llm/swallow-evaluation, tagged as v202411.

MT - Bench JA

We used Japanese MT - Bench to assess the capabilities of multi - turn dialogue with the following settings:

Implementation: FastChat [Zheng+, 2023] (commit #e86e70d0)
Question: Nejumi LLM - Leaderboard NEO, mtbench_ja_question_v4
Reference Answer: A revised version of Nejumi LLM - Leaderboard NEO, mtbench_ja_referenceanswer_v2, in which we verified and corrected incorrect answers. This revised version has been released alongside swallow - evaluation Ver. 202411.
Prompt for Judge: Nejumi LLM - Leaderboard NEO, mtbench_ja_prompt_v1
Judge: gpt - 4o - 2024 - 08 - 06
Scoring: Absolute scale normalized to a 0 - 1 range, averaged over five runs.

Japanese evaluation benchmarks

We used llm - jp - eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:

Multiple - choice question answering (JCommonsenseQA [Kurihara et al., 2022])
Open - ended question answering (JEMHopQA [Ishii et al., 2024])
Open - ended question answering (NIILC [Èñ¢Ê†π, 2003])
Machine reading comprehension (JSQuAD [Kurihara et al., 2022])
Automatic summarization (XL - Sum [Hasan et al., 2021])
Machine translation (WMT2020 ja - en [Barrault et al., 2020])
Machine translation (WMT2020 en - ja [Barrault et al., 2020])
Mathematical reasoning (MGSM [Shi et al., 2023])
Academic exams (JMMLU [Â∞π„Çâ, 2024])
Code generation (JHumanEval [‰ΩêËó§„Çâ, 2024])

English evaluation benchmarks

We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:

Multiple - choice question answering (OpenBookQA [Mihaylov et al., 2018])
Open - ended question answering (TriviaQA [Joshi et al., 2017])
Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
Natural language inference (HellaSwag [Zellers et al., 2019])
Mathematical reasoning (GSM8K [Cobbe et al., 2021])
Mathematical reasoning (MATH [Hendrycks et al., 2022][Lightman et al., 2024])
Reasoning (BBH (BIG - Bench - Hard) [Suzgun et al., 2023])
Academic exams (MMLU [Hendrycks et al., 2021])
Code generation (HumanEval [Chen et al., 2021])

Training Datasets

Instruction Tuning

The following datasets were used for the instruction tuning:

Gemma - 2 - LMSYS - Chat - 1M - Synth
- Multi - turn Japanese instruction dataset synthesized and derived from lmsys - chat - 1m [Zhang+, ICLR24]).
- First - turn user instructions were translated into Japanese via DeepL (machine translation), and assistant responses were generated using gemma - 2 - 27b - it. The same model, i.e., gemma - 2 - 27b - it served as a judge for rejection sampling (n = 6).
- Second - turn user instructions and responses were synthesized using gemma - 2 - 27b - it. The same model scores the quality of the second - turn response with a range of 1 - 10. Second - turn responses with scores lower than 9 were rejected, along with their corresponding instructions.
  Conversations containing personally identifiable information (PII) and template - based user instructions were removed. Duplicate instructions were removed.
Swallow - Magpie - Ultra - v0.1
- A Japanese variant of the filtered - magpie - ultra - en dataset, translated into Japanese by gemma - 2 - 27b - it.
Swallow - Gemma - Magpie - v0.1
- A Japanese synthetic instruction tuning dataset from scratch, generated by gemma - 2 - 27b - it. User instructions were created with prompts specific to each topic, and assistant responses were generated for these instructions.
- The conversations were heuristically filtered for quality and length. Then, gemma - 2 - 27b - it was applied to score the quality of each of the conversation with a range of 1 - 10. Conversations with scores <= 7 were rejected.

🔧 Technical Details

The Gemma-2-Llama-Swallow series was built by continual pre - training on the gemma - 2 models. Approximately 200 billion tokens were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre - training. The instruction - tuned models (it) were built by supervised fine - tuning (SFT) on the synthetic data specially built for Japanese.

📄 License

gemma
llama3.3

Risks and Limitations

The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.

Acknowledgements

We thank Google DeepMind for releasing Gemma 2 under a generous open license.

We received various support, including:

AIST project: "Research and Development of Foundation Models for Generative AI in the Physical Domain"
NEDO project: "Development of Artificial Intelligence Application Technology to Support"

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご