
Model Overview
Model Features
Model Capabilities
Use Cases
๐ Llama 3.3 Swallow - Built with Llama
Llama 3.3 Swallow is a large language model (70B) that enhances Japanese language capabilities while retaining English ones, built by continual pre - training on Meta Llama 3.3.
๐ Quick Start
Llama 3.3 Swallow is a large language model (70B) that was built by continual pre - training on the Meta Llama 3.3 model. It enhanced the Japanese language capabilities of the original Llama 3.3 while retaining the English language capabilities.
To get started, you can install the necessary library and use the following code:
pip install vllm
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
model_name = "tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4"
# The code might continue here in the original README, but it was cut off.
# Assume the user will complete the usage code as needed.
โจ Features
- Continual pre - training on the Meta Llama 3.3 model.
- Enhanced Japanese language capabilities while maintaining English language proficiency.
- Instruction - tuned models built by supervised fine - tuning on synthetic Japanese data.
๐ฆ Installation
pip install vllm
๐ป Usage Examples
Basic Usage
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
model_name = "tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4"
# You can add more code here for actual usage, like tokenizing and generating text
๐ Documentation
Release History
- March 10, 2025: Released Llama-3.3-Swallow-70B-Instruct-v0.4 and Llama-3.3-Swallow-70B-v0.4.
- December 30, 2024: Released Llama-3.1-Swallow-70B-Instruct-v0.3.
- December 23, 2024: Released Llama-3.1-Swallow-8B-Instruct-v0.3.
- November 11, 2024: Released Llama-3.1-Swallow-8B-v0.2 and Llama-3.1-Swallow-8B-Instruct-v0.2.
- October 08, 2024: Released Llama-3.1-Swallow-8B-v0.1, Llama-3.1-Swallow-8B-Instruct-v0.1, Llama-3.1-Swallow-70B-v0.1, and Llama-3.1-Swallow-70B-Instruct-v0.1.
Swallow Model Index
Model | Llama-3.1-Swallow v0.1 | Llama-3.1-Swallow-Instruct v0.1 | Llama-3.1-Swallow v0.2 | Llama-3.1-Swallow-Instruct v0.2 | Llama-3.1-Swallow-Instruct v0.3 | Llama-3.3-Swallow v0.4 | Llama-3.3-Swallow-Instruct v0.4 |
---|---|---|---|---|---|---|---|
8B | ๐ค HuggingFace | ๐ค HuggingFace | ๐ค HuggingFace | ๐ค HuggingFace | ๐ค HuggingFace | ||
70B | ๐ค HuggingFace | ๐ค HuggingFace | ๐ค HuggingFace | ๐ค HuggingFace | ๐ค HuggingFace |
The website https://swallow-llm.github.io/ provides large language models developed by the Swallow team.
Model Details
Property | Details |
---|---|
Model Type | Please refer to Llama 3.1 MODEL_CARD for details on the model architecture. |
Language(s) | Japanese English |
Library | Megatron-LM |
Tokenizer | Please refer to Llama 3.1 blog for details on the tokenizer. |
Contact | swallow[at]nlp.c.titech.ac.jp |
Model Performance
MT-Bench JA
Model | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | JMT Avg |
---|---|---|---|---|---|---|---|---|---|
Llama 3 70B Instruct | 0.588 | 0.884 | 0.715 | 0.637 | 0.487 | 0.594 | 0.598 | 0.619 | 0.640 |
Llama 3.1 70B Instruct | 0.691 | 0.848 | 0.730 | 0.669 | 0.618 | 0.699 | 0.699 | 0.694 | 0.706 |
Llama 3.3 70B Instruct | 0.707 | 0.865 | 0.757 | 0.720 | 0.635 | 0.773 | 0.706 | 0.733 | 0.737 |
Llama 3 Youko 70B Instruct | 0.607 | 0.894 | 0.834 | 0.609 | 0.673 | 0.790 | 0.764 | 0.829 | 0.750 |
Llama-3.1-70B-Japanese-Instruct-24070 | 0.683 | 0.827 | 0.824 | 0.749 | 0.643 | 0.818 | 0.715 | 0.751 | 0.751 |
Llama 3 heron brain 70B v0.3 | 0.510 | 0.870 | 0.776 | 0.680 | 0.513 | 0.727 | 0.692 | 0.693 | 0.683 |
Llama 3 Swallow 70B Instruct | 0.633 | 0.823 | 0.601 | 0.521 | 0.482 | 0.622 | 0.635 | 0.630 | 0.618 |
Llama 3.1 Swallow 70B Instruct v0.1 | 0.654 | 0.792 | 0.768 | 0.704 | 0.573 | 0.682 | 0.653 | 0.704 | 0.691 |
Llama 3.1 Swallow 70B Instruct v0.3 | 0.678 | 0.820 | 0.867 | 0.776 | 0.570 | 0.816 | 0.769 | 0.852 | 0.769 |
Llama 3.3 Swallow 70B Instruct v0.4 | 0.705 | 0.820 | 0.870 | 0.730 | 0.623 | 0.811 | 0.781 | 0.832 | 0.772 |
Qwen2-72B-Instruct | 0.632 | 0.800 | 0.842 | 0.688 | 0.616 | 0.824 | 0.797 | 0.846 | 0.756 |
Qwen2.5-72B-Instruct | 0.795 | 0.860 | 0.865 | 0.857 | 0.784 | 0.863 | 0.804 | 0.854 | 0.835 |
GPT-3.5 (gpt-3.5-turbo-0125) | 0.693 | 0.789 | 0.773 | 0.665 | 0.462 | 0.728 | 0.644 | 0.775 | 0.691 |
GPT-4o (gpt-4o-2024-08-06) | 0.855 | 0.926 | 0.880 | 0.872 | 0.706 | 0.862 | 0.838 | 0.849 | 0.848 |
GPT-4o-mini (gpt-4o-mini-2024-07-18) | 0.825 | 0.865 | 0.857 | 0.843 | 0.665 | 0.846 | 0.855 | 0.840 | 0.824 |
Japanese tasks
Model | JCom. | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20-en-ja | WMT20-ja-en | JMMLU | JHumanEval | Ja Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
4-shot | 4-shot | 4-shot | 4-shot | 1-shot | 4-shot | 4-shot | 4-shot | 5-shot | 0-shot | ||
EM acc | Char-F1 | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU | BLEU | EM acc | pass@1 | ||
Llama 3 70B Instruct | 0.940 | 0.615 | 0.557 | 0.913 | 0.191 | 0.716 | 0.269 | 0.234 | 0.680 | 0.662 | 0.578 |
Llama 3.1 70B Instruct | 0.950 | 0.635 | 0.579 | 0.921 | 0.178 | 0.732 | 0.279 | 0.247 | 0.733 | 0.696 | 0.595 |
Llama 3.3 70B Instruct | 0.941 | 0.640 | 0.570 | 0.893 | 0.179 | 0.784 | 0.278 | 0.243 | 0.735 | 0.744 | 0.601 |
Llama 3 Youko 70B Instruct | 0.952 | 0.625 | 0.584 | 0.921 | 0.198 | 0.720 | 0.263 | 0.226 | 0.718 | 0.610 | 0.582 |
Llama-3.1-70B-Japanese-Instruct-24070 | 0.956 | 0.647 | 0.660 | 0.919 | 0.156 | 0.748 | 0.290 | 0.241 | 0.723 | 0.627 | 0.597 |
Llama 3 heron brain 70B v0.3 | 0.965 | 0.652 | 0.679 | 0.922 | 0.261 | 0.772 | 0.309 | 0.258 | 0.707 | 0.623 | 0.615 |
Llama 3 Swallow 70B Instruct | 0.963 | 0.627 | 0.598 | 0.921 | 0.139 | 0.672 | 0.272 | 0.255 | 0.657 | 0.608 | 0.571 |
Llama 3.1 Swallow 70B Instruct v0.1 | 0.962 | 0.621 | 0.660 | 0.924 | 0.192 | 0.776 | 0.312 | 0.259 | 0.711 | 0.468 | 0.588 |
Llama 3.1 Swallow 70B Instruct v0.3 | 0.964 | 0.632 | 0.654 | 0.911 | 0.196 | 0.772 | 0.305 | 0.257 | 0.690 | 0.596 | 0.598 |
Llama 3.3 Swallow 70B Instruct v0.4 | 0.981 | 0.618 | 0.662 | 0.907 | 0.162 | 0.812 | 0.319 | 0.261 | 0.707 | 0.700 | 0.613 |
Qwen2-72B-Instruct | 0.963 | 0.628 | 0.557 | 0.920 | 0.166 | 0.780 | 0.260 | 0.232 | 0.771 | 0.701 | 0.598 |
Qwen2.5-72B-Instruct | 0.970 | 0.569 | 0.582 | 0.738 | 0.170 | 0.840 | 0.227 | 0.218 | 0.789 | 0.634 | 0.574 |
GPT-3.5 (gpt-3.5-turbo-0125) | 0.922 | 0.456 | 0.447 | 0.893 | 0.215 | 0.572 | 0.287 | 0.243 | 0.499 | 0.616 | 0.515 |
GPT-4o (gpt-4o-2024-08-06) | 0.982 | 0.731 | 0.709 | 0.889 | 0.170 | 0.864 | 0.314 | 0.254 | 0.797 | 0.752 | 0.646 |
GPT-4o-mini (gpt-4o-mini-2024-07-18) | 0.961 | 0.464 | 0.591 | 0.902 | 0.160 | 0.832 | 0.299 | 0.241 | 0.679 | 0.675 | 0.580 |
English tasks
Model | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO | MMLU | GSM8K | MATH | BBH | HumanEval | En Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
4-shot | 4-shot | 4-shot | 4-shot | 4-shot | 5-shot | 4-shot | 4-shot | 3-shot | 0-shot | ||
Acc | EM acc | Acc | EM acc | Acc | Acc | EM acc | CoT EM Acc | CoT EM Acc | pass@1 | ||
Llama 3 70B Instruct | 0.438 | 0.800 | 0.655 | 0.696 | 0.914 | 0.800 | 0.909 | 0.474 | 0.833 | 0.774 | 0.729 |
Llama 3.1 70B Instruct | 0.426 | 0.821 | 0.662 | 0.660 | 0.917 | 0.822 | 0.876 | 0.560 | 0.842 | 0.794 | 0.738 |
Llama 3.3 70B Instruct | 0.426 | 0.817 | 0.667 | 0.684 | 0.917 | 0.824 | 0.890 | 0.706 | 0.853 | 0.834 | 0.762 |
Llama 3 Youko 70B Instruct | 0.454 | 0.797 | 0.686 | 0.659 | 0.915 | 0.805 | 0.892 | 0.434 | 0.780 | 0.662 | 0.708 |
Llama-3.1-70B-Japanese-Instruct-24070 | 0.422 | 0.810 | 0.647 | 0.663 | 0.917 | 0.807 | 0.889 | 0.528 | 0.823 | 0.746 | 0.725 |
Llama 3 heron brain 70B v0.3 | 0.446 | 0.811 | 0.668 | 0.706 | 0.919 | 0.790 | 0.877 | 0.508 | 0.759 | 0.668 | 0.715 |
Llama 3 Swallow 70B Instruct | 0.446 | 0.818 | 0.676 | 0.681 | 0.923 | 0.789 | 0.868 | 0.460 | 0.816 | 0.680 | 0.716 |
Llama 3.1 Swallow 70B Instruct v0.1 | 0.446 | 0.815 | 0.683 | 0.681 | 0.917 | 0.787 | 0.884 | 0.474 | 0.848 | 0.568 | 0.710 |
Llama 3.1 Swallow 70B Instruct v0.3 | 0.454 | 0.825 | 0.692 | 0.647 | 0.919 | 0.777 | 0.872 | 0.458 | 0.816 | 0.643 | 0.710 |
Llama 3.3 Swallow 70B Instruct v0.4 | 0.448 | 0.817 | 0.686 | 0.654 | 0.912 | 0.803 | 0.908 | 0.566 | 0.812 | 0.750 | 0.736 |
Qwen2-72B-Instruct | 0.444 | 0.759 | 0.685 | 0.685 | 0.911 | 0.839 | 0.848 | 0.634 | 0.193 | 0.688 | 0.669 |
Qwen2.5-72B-Instruct | 0.454 | 0.676 | 0.706 | 0.677 | 0.889 | 0.848 | 0.904 | 0.770 | 0.375 | 0.614 | 0.691 |
Evaluation Benchmarks
MT-Bench JA
We used Japanese MT-Bench to assess the capabilities of multi - turn dialogue with the following settings:
- Implementation: FastChat [Zheng+, 2023] (commit #e86e70d0)
- Question: Nejumi LLM - Leaderboard NEO, mtbench_ja_question_v4
- Reference Answer: A revised version of Nejumi LLM - Leaderboard NEO, mtbench_ja_referenceanswer_v2, in which we verified and corrected incorrect answers. This revised version has been released alongside swallow - evaluation Ver. 202411.
- Prompt for Judge: Nejumi LLM - Leaderboard NEO, mtbench_ja_prompt_v1
- Judge:
gpt - 4o - 2024 - 08 - 06
- Scoring: Absolute scale normalized to a 0 - 1 range, averaged over five runs.
Japanese evaluation benchmarks
We used llm - jp - eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
- Multiple - choice question answering (JCommonsenseQA [Kurihara et al., 2022])
- Open - ended question answering (JEMHopQA [Ishii et al., 2024])
- Open - ended question answering (NIILC [้ขๆ น, 2003])
- Machine reading comprehension (JSQuAD [Kurihara et al., 2022])
- Automatic summarization (XL - Sum [Hasan et al., 2021])
- Machine translation (WMT2020 ja - en [Barrault et al., 2020])
- Machine translation (WMT2020 en - ja [Barrault et al., 2020])
- Mathematical reasoning (MGSM [Shi et al., 2023])
- Academic exams (JMMLU [ๅฐนใ, 2024])
- Code generation (JHumanEval [ไฝ่คใ, 2024])
English evaluation benchmarks
We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
- Multiple - choice question answering (OpenBookQA [Mihaylov et al., 2018])
- Open - ended question answering (TriviaQA [Joshi et al., 2017])
- Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
- Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
- Natural language inference (HellaSwag [Zellers et al., 2019])
- Mathematical reasoning (GSM8K [Cobbe et al., 2021])
- Mathematical reasoning (MATH [Hendrycks et al., 2022][Lightman et al., 2024])
- Reasoning (BBH (BIG - Bench - Hard) [Suzgun et al., 2023])
- Academic exams (MMLU [Hendrycks et al., 2021])
- Code generation (HumanEval [Chen et al., 2021])
๐ License
The model is under the llama3.3 and gemma licenses. (You need to replace the URLs with the actual license URLs if available.)

