ja-cascaded-s2t-translation Open-Source Model - Effortlessly Achieve Japanese Speech to Multilingual Text Translation

Ja Cascaded S2t Translation

Developed by japanese-asr

This is a Japanese speech-to-any-target-language text translation pipeline based on a cascaded approach, consisting of automatic speech recognition (ASR) and text translation components.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Japanese Speech-to-Text Translation #Multilingual Translation #Cascaded Model

Downloads 60

Release Time : 9/25/2024

Model Overview

The pipeline uses kotoba-tech/kotoba-whisper-v2.0 for Japanese speech recognition (Japanese speech -> Japanese text) and facebook/nllb-200-3.3B for text translation. The input must be Japanese speech, while the translation can be any language trained by NLLB.

Model Features

High Accuracy

Achieves lower Word Error Rate (WER) compared to the OpenAI Whisper model in Japanese speech-to-English text translation tasks.

Multilingual Support

Supports translation of Japanese speech into any target language trained by the NLLB model.

Modular Design

Adopts a cascaded approach, allowing flexible replacement of ASR or translation modules.

Efficient Inference

Maintains fast inference speeds even with longer audio inputs.

Model Capabilities

Japanese Speech Recognition

Multilingual Text Translation

Audio Processing

Use Cases

Speech Translation

Japanese Meeting Minutes Translation

Real-time translation of Japanese meeting recordings into English or other language texts.

Achieves 64.3 WER on the CoVoST2 dataset

Japanese Language Education

Assists Japanese learners in converting Japanese speech into their native language text.

Multilingual Content Creation

Podcast Multilingual Subtitle Generation

Automatically translates Japanese podcast content into multilingual subtitles.

model	CoVoST2 (Ja->En)	Fleurs (Ja->En)
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-3.3B)	64.3	67.1
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-1.3B)	65.4	68.9
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-distilled-1.3B)	65.6	67.4
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-distilled-600M)	68.2	72.2
openai/whisper-large-v3	71	86.1
openai/whisper-large-v2	66.4	78.8
openai/whisper-large	66.5	86.1
openai/whisper-medium	70.3	97.2
openai/whisper-small	97.3	132.2
openai/whisper-base	186.2	349.6
openai/whisper-tiny	377.2	474

model	10	30	60	300
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-3.3B)	0.173	0.247	0.352	1.772
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-1.3B)	0.173	0.24	0.348	1.515
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-distilled-1.3B)	0.17	0.245	0.348	1.882
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-distilled-600M)	0.108	0.179	0.283	1.33
openai/whisper-large-v3	0.061	0.184	0.372	1.804
openai/whisper-large-v2	0.062	0.199	0.415	1.854
openai/whisper-large	0.062	0.183	0.363	1.899
openai/whisper-medium	0.045	0.132	0.266	1.368
openai/whisper-small	0.135	0.376	0.631	3.495
openai/whisper-base	0.054	0.108	0.231	1.019
openai/whisper-tiny	0.045	0.124	0.208	0.838

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Ja Cascaded S2t Translation

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Cascaded Japanese Speech2Text Translation

🚀 Quick Start

✨ Features

📦 Installation

💻 Usage Examples

Basic Usage

Advanced Usage

📚 Documentation

Benchmark

Inference Speed

📄 License