Llama-3.3-Swallow-70B-Instruct-v0.4 Open-source Large Model - Balancing English and Enhanced Japanese Communication Applications

Llama 3.3 Swallow 70B Instruct V0.4

Developed by tokyotech-llm

Llama 3.3 Swallow is a large language model (70B) based on continuous pre-training of the Meta Llama 3.3 model, enhancing Japanese capabilities while retaining original English proficiency.

Large Language Model

Transformers

Supports Multiple Languages#Japanese-enhanced LLM #70B large parameters #Multilingual text generation

Downloads 874

Release Time : 4/25/2025

Model Overview

A Japanese-enhanced large language model built through continuous pre-training of the Llama 3.3 model, suitable for bilingual text generation tasks.

Model Features

Enhanced bilingual capabilities

Significantly improved Japanese processing while retaining Llama 3.3's original English capabilities

Large-scale continuous pre-training

Continuous pre-training using approximately 315 billion tokens of Japanese and English data

Instruction tuning optimization

Improved instruction-following capabilities through supervised fine-tuning (SFT) on Japanese synthetic data

Model Capabilities

Japanese text generation

English text generation

Bilingual translation

Instruction following

Code generation

Use Cases

Language processing

Japanese content creation

Generate high-quality Japanese articles, reports, etc.

Achieved an average score of 0.772 in JMT-Bench JA evaluation

English-Japanese bilingual translation

Provide mutual translation services between English and Japanese

Performed well in WMT20 translation tasks

Education

Japanese learning assistance

Provide grammar explanations and exercise generation for Japanese learners

🚀 Llama 3.3 Swallow - Built with Llama

Llama 3.3 Swallow is a large language model (70B) that enhances Japanese language capabilities while retaining English ones, built by continual pre - training on Meta Llama 3.3.

🚀 Quick Start

Llama 3.3 Swallow is a large language model (70B) that was built by continual pre - training on the Meta Llama 3.3 model. It enhanced the Japanese language capabilities of the original Llama 3.3 while retaining the English language capabilities.

To get started, you can install the necessary library and use the following code:

pip install vllm

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4"

# The code might continue here in the original README, but it was cut off. 
# Assume the user will complete the usage code as needed.

✨ Features

Continual pre - training on the Meta Llama 3.3 model.
Enhanced Japanese language capabilities while maintaining English language proficiency.
Instruction - tuned models built by supervised fine - tuning on synthetic Japanese data.

📦 Installation

pip install vllm

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4"
# You can add more code here for actual usage, like tokenizing and generating text

📚 Documentation

Release History

March 10, 2025: Released Llama-3.3-Swallow-70B-Instruct-v0.4 and Llama-3.3-Swallow-70B-v0.4.
December 30, 2024: Released Llama-3.1-Swallow-70B-Instruct-v0.3.
December 23, 2024: Released Llama-3.1-Swallow-8B-Instruct-v0.3.
November 11, 2024: Released Llama-3.1-Swallow-8B-v0.2 and Llama-3.1-Swallow-8B-Instruct-v0.2.
October 08, 2024: Released Llama-3.1-Swallow-8B-v0.1, Llama-3.1-Swallow-8B-Instruct-v0.1, Llama-3.1-Swallow-70B-v0.1, and Llama-3.1-Swallow-70B-Instruct-v0.1.

Swallow Model Index

Model	Llama-3.1-Swallow v0.1	Llama-3.1-Swallow-Instruct v0.1	Llama-3.1-Swallow v0.2	Llama-3.1-Swallow-Instruct v0.2	Llama-3.1-Swallow-Instruct v0.3	Llama-3.3-Swallow v0.4	Llama-3.3-Swallow-Instruct v0.4
8B	🤗 HuggingFace	🤗 HuggingFace	🤗 HuggingFace	🤗 HuggingFace	🤗 HuggingFace
70B	🤗 HuggingFace	🤗 HuggingFace			🤗 HuggingFace	🤗 HuggingFace	🤗 HuggingFace

The website https://swallow-llm.github.io/ provides large language models developed by the Swallow team.

Model Details

Property	Details
Model Type	Please refer to Llama 3.1 MODEL_CARD for details on the model architecture.
Language(s)	Japanese English
Library	Megatron-LM
Tokenizer	Please refer to Llama 3.1 blog for details on the tokenizer.
Contact	swallow[at]nlp.c.titech.ac.jp

Model Performance

MT-Bench JA

Model	coding	extraction	humanities	math	reasoning	roleplay	stem	writing	JMT Avg
Llama 3 70B Instruct	0.588	0.884	0.715	0.637	0.487	0.594	0.598	0.619	0.640
Llama 3.1 70B Instruct	0.691	0.848	0.730	0.669	0.618	0.699	0.699	0.694	0.706
Llama 3.3 70B Instruct	0.707	0.865	0.757	0.720	0.635	0.773	0.706	0.733	0.737
Llama 3 Youko 70B Instruct	0.607	0.894	0.834	0.609	0.673	0.790	0.764	0.829	0.750
Llama-3.1-70B-Japanese-Instruct-24070	0.683	0.827	0.824	0.749	0.643	0.818	0.715	0.751	0.751
Llama 3 heron brain 70B v0.3	0.510	0.870	0.776	0.680	0.513	0.727	0.692	0.693	0.683
Llama 3 Swallow 70B Instruct	0.633	0.823	0.601	0.521	0.482	0.622	0.635	0.630	0.618
Llama 3.1 Swallow 70B Instruct v0.1	0.654	0.792	0.768	0.704	0.573	0.682	0.653	0.704	0.691
Llama 3.1 Swallow 70B Instruct v0.3	0.678	0.820	0.867	0.776	0.570	0.816	0.769	0.852	0.769
Llama 3.3 Swallow 70B Instruct v0.4	0.705	0.820	0.870	0.730	0.623	0.811	0.781	0.832	0.772
Qwen2-72B-Instruct	0.632	0.800	0.842	0.688	0.616	0.824	0.797	0.846	0.756
Qwen2.5-72B-Instruct	0.795	0.860	0.865	0.857	0.784	0.863	0.804	0.854	0.835
GPT-3.5 (gpt-3.5-turbo-0125)	0.693	0.789	0.773	0.665	0.462	0.728	0.644	0.775	0.691
GPT-4o (gpt-4o-2024-08-06)	0.855	0.926	0.880	0.872	0.706	0.862	0.838	0.849	0.848
GPT-4o-mini (gpt-4o-mini-2024-07-18)	0.825	0.865	0.857	0.843	0.665	0.846	0.855	0.840	0.824

Japanese tasks

Model	JCom.	JEMHopQA	NIILC	JSQuAD	XL-Sum	MGSM	WMT20-en-ja	WMT20-ja-en	JMMLU	JHumanEval	Ja Avg
	4-shot	4-shot	4-shot	4-shot	1-shot	4-shot	4-shot	4-shot	5-shot	0-shot
	EM acc	Char-F1	Char-F1	Char-F1	ROUGE-2	EM acc	BLEU	BLEU	EM acc	pass@1
Llama 3 70B Instruct	0.940	0.615	0.557	0.913	0.191	0.716	0.269	0.234	0.680	0.662	0.578
Llama 3.1 70B Instruct	0.950	0.635	0.579	0.921	0.178	0.732	0.279	0.247	0.733	0.696	0.595
Llama 3.3 70B Instruct	0.941	0.640	0.570	0.893	0.179	0.784	0.278	0.243	0.735	0.744	0.601
Llama 3 Youko 70B Instruct	0.952	0.625	0.584	0.921	0.198	0.720	0.263	0.226	0.718	0.610	0.582
Llama-3.1-70B-Japanese-Instruct-24070	0.956	0.647	0.660	0.919	0.156	0.748	0.290	0.241	0.723	0.627	0.597
Llama 3 heron brain 70B v0.3	0.965	0.652	0.679	0.922	0.261	0.772	0.309	0.258	0.707	0.623	0.615
Llama 3 Swallow 70B Instruct	0.963	0.627	0.598	0.921	0.139	0.672	0.272	0.255	0.657	0.608	0.571
Llama 3.1 Swallow 70B Instruct v0.1	0.962	0.621	0.660	0.924	0.192	0.776	0.312	0.259	0.711	0.468	0.588
Llama 3.1 Swallow 70B Instruct v0.3	0.964	0.632	0.654	0.911	0.196	0.772	0.305	0.257	0.690	0.596	0.598
Llama 3.3 Swallow 70B Instruct v0.4	0.981	0.618	0.662	0.907	0.162	0.812	0.319	0.261	0.707	0.700	0.613
Qwen2-72B-Instruct	0.963	0.628	0.557	0.920	0.166	0.780	0.260	0.232	0.771	0.701	0.598
Qwen2.5-72B-Instruct	0.970	0.569	0.582	0.738	0.170	0.840	0.227	0.218	0.789	0.634	0.574
GPT-3.5 (gpt-3.5-turbo-0125)	0.922	0.456	0.447	0.893	0.215	0.572	0.287	0.243	0.499	0.616	0.515
GPT-4o (gpt-4o-2024-08-06)	0.982	0.731	0.709	0.889	0.170	0.864	0.314	0.254	0.797	0.752	0.646
GPT-4o-mini (gpt-4o-mini-2024-07-18)	0.961	0.464	0.591	0.902	0.160	0.832	0.299	0.241	0.679	0.675	0.580

English tasks

Model	OpenBookQA	TriviaQA	HellaSWAG	SQuAD2.0	XWINO	MMLU	GSM8K	MATH	BBH	HumanEval	En Avg
	4-shot	4-shot	4-shot	4-shot	4-shot	5-shot	4-shot	4-shot	3-shot	0-shot
	Acc	EM acc	Acc	EM acc	Acc	Acc	EM acc	CoT EM Acc	CoT EM Acc	pass@1
Llama 3 70B Instruct	0.438	0.800	0.655	0.696	0.914	0.800	0.909	0.474	0.833	0.774	0.729
Llama 3.1 70B Instruct	0.426	0.821	0.662	0.660	0.917	0.822	0.876	0.560	0.842	0.794	0.738
Llama 3.3 70B Instruct	0.426	0.817	0.667	0.684	0.917	0.824	0.890	0.706	0.853	0.834	0.762
Llama 3 Youko 70B Instruct	0.454	0.797	0.686	0.659	0.915	0.805	0.892	0.434	0.780	0.662	0.708
Llama-3.1-70B-Japanese-Instruct-24070	0.422	0.810	0.647	0.663	0.917	0.807	0.889	0.528	0.823	0.746	0.725
Llama 3 heron brain 70B v0.3	0.446	0.811	0.668	0.706	0.919	0.790	0.877	0.508	0.759	0.668	0.715
Llama 3 Swallow 70B Instruct	0.446	0.818	0.676	0.681	0.923	0.789	0.868	0.460	0.816	0.680	0.716
Llama 3.1 Swallow 70B Instruct v0.1	0.446	0.815	0.683	0.681	0.917	0.787	0.884	0.474	0.848	0.568	0.710
Llama 3.1 Swallow 70B Instruct v0.3	0.454	0.825	0.692	0.647	0.919	0.777	0.872	0.458	0.816	0.643	0.710
Llama 3.3 Swallow 70B Instruct v0.4	0.448	0.817	0.686	0.654	0.912	0.803	0.908	0.566	0.812	0.750	0.736
Qwen2-72B-Instruct	0.444	0.759	0.685	0.685	0.911	0.839	0.848	0.634	0.193	0.688	0.669
Qwen2.5-72B-Instruct	0.454	0.676	0.706	0.677	0.889	0.848	0.904	0.770	0.375	0.614	0.691

Evaluation Benchmarks

MT-Bench JA

We used Japanese MT-Bench to assess the capabilities of multi - turn dialogue with the following settings:

Implementation: FastChat [Zheng+, 2023] (commit #e86e70d0)
Question: Nejumi LLM - Leaderboard NEO, mtbench_ja_question_v4
Reference Answer: A revised version of Nejumi LLM - Leaderboard NEO, mtbench_ja_referenceanswer_v2, in which we verified and corrected incorrect answers. This revised version has been released alongside swallow - evaluation Ver. 202411.
Prompt for Judge: Nejumi LLM - Leaderboard NEO, mtbench_ja_prompt_v1
Judge: gpt - 4o - 2024 - 08 - 06
Scoring: Absolute scale normalized to a 0 - 1 range, averaged over five runs.

Japanese evaluation benchmarks

We used llm - jp - eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:

Multiple - choice question answering (JCommonsenseQA [Kurihara et al., 2022])
Open - ended question answering (JEMHopQA [Ishii et al., 2024])
Open - ended question answering (NIILC [関根, 2003])
Machine reading comprehension (JSQuAD [Kurihara et al., 2022])
Automatic summarization (XL - Sum [Hasan et al., 2021])
Machine translation (WMT2020 ja - en [Barrault et al., 2020])
Machine translation (WMT2020 en - ja [Barrault et al., 2020])
Mathematical reasoning (MGSM [Shi et al., 2023])
Academic exams (JMMLU [尹ら, 2024])
Code generation (JHumanEval [佐藤ら, 2024])

English evaluation benchmarks

We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:

Multiple - choice question answering (OpenBookQA [Mihaylov et al., 2018])
Open - ended question answering (TriviaQA [Joshi et al., 2017])
Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
Natural language inference (HellaSwag [Zellers et al., 2019])
Mathematical reasoning (GSM8K [Cobbe et al., 2021])
Mathematical reasoning (MATH [Hendrycks et al., 2022][Lightman et al., 2024])
Reasoning (BBH (BIG - Bench - Hard) [Suzgun et al., 2023])
Academic exams (MMLU [Hendrycks et al., 2021])
Code generation (HumanEval [Chen et al., 2021])

📄 License

The model is under the llama3.3 and gemma licenses. (You need to replace the URLs with the actual license URLs if available.)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご