L

Llama 3.3 Swallow 70B Instruct V0.4

Developed by tokyotech-llm
Llama 3.3 Swallow is a large language model (70B) based on continuous pre-training of the Meta Llama 3.3 model, enhancing Japanese capabilities while retaining original English proficiency.
Downloads 874
Release Time : 4/25/2025

Model Overview

A Japanese-enhanced large language model built through continuous pre-training of the Llama 3.3 model, suitable for bilingual text generation tasks.

Model Features

Enhanced bilingual capabilities
Significantly improved Japanese processing while retaining Llama 3.3's original English capabilities
Large-scale continuous pre-training
Continuous pre-training using approximately 315 billion tokens of Japanese and English data
Instruction tuning optimization
Improved instruction-following capabilities through supervised fine-tuning (SFT) on Japanese synthetic data

Model Capabilities

Japanese text generation
English text generation
Bilingual translation
Instruction following
Code generation

Use Cases

Language processing
Japanese content creation
Generate high-quality Japanese articles, reports, etc.
Achieved an average score of 0.772 in JMT-Bench JA evaluation
English-Japanese bilingual translation
Provide mutual translation services between English and Japanese
Performed well in WMT20 translation tasks
Education
Japanese learning assistance
Provide grammar explanations and exercise generation for Japanese learners
Featured Recommended AI Models
ยฉ 2025AIbase