Ja Cascaded S2t Translation
This is a Japanese speech-to-any-target-language text translation pipeline based on a cascaded approach, consisting of automatic speech recognition (ASR) and text translation components.
Downloads 60
Release Time : 9/25/2024
Model Overview
The pipeline uses kotoba-tech/kotoba-whisper-v2.0 for Japanese speech recognition (Japanese speech -> Japanese text) and facebook/nllb-200-3.3B for text translation. The input must be Japanese speech, while the translation can be any language trained by NLLB.
Model Features
High Accuracy
Achieves lower Word Error Rate (WER) compared to the OpenAI Whisper model in Japanese speech-to-English text translation tasks.
Multilingual Support
Supports translation of Japanese speech into any target language trained by the NLLB model.
Modular Design
Adopts a cascaded approach, allowing flexible replacement of ASR or translation modules.
Efficient Inference
Maintains fast inference speeds even with longer audio inputs.
Model Capabilities
Japanese Speech Recognition
Multilingual Text Translation
Audio Processing
Use Cases
Speech Translation
Japanese Meeting Minutes Translation
Real-time translation of Japanese meeting recordings into English or other language texts.
Achieves 64.3 WER on the CoVoST2 dataset
Japanese Language Education
Assists Japanese learners in converting Japanese speech into their native language text.
Multilingual Content Creation
Podcast Multilingual Subtitle Generation
Automatically translates Japanese podcast content into multilingual subtitles.
Featured Recommended AI Models
Š 2025AIbase