Kotoba-Whisper-Bilingual-v1.0 Open-Source Model - Free Japanese and English Speech Recognition and Language Translation

Kotoba Whisper Bilingual V1.0

Developed by kotoba-tech

Kotoba-Whisper-Bilingual is a distilled model collection trained from the Whisper model, specifically designed for Japanese and English speech recognition and speech-to-text translation tasks.

Speech Recognition

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Japanese-English Bilingual ASR #Speech-to-Text Translation #Low-Latency Inference

Downloads 782

Release Time : 9/27/2024

Model Overview

This model supports automatic speech recognition (ASR) for Japanese and English, as well as speech-to-text translation tasks between Japanese and English.

Model Features

Bilingual Support

Supports both Japanese and English speech recognition and mutual translation

Efficient Inference

6.3 times faster than the original Whisper large-v3 model

Multitask Capability

Can perform both speech recognition and speech-to-text translation tasks simultaneously

Model Capabilities

Japanese Speech Recognition

English Speech Recognition

Japanese-to-English Speech Translation

English-to-Japanese Speech Translation

Use Cases

Speech Recognition

Japanese Speech Transcription

Convert Japanese speech into text

CER of 9.8 on the CommonVoice 8 Japanese test set

English Speech Transcription

Convert English speech into text

Performs well on the ESB dataset

Speech Translation

Japanese-to-English Translation

Real-time translation of Japanese speech into English text

WER of 73.9 on CoVoST2 (Ja->En)

English-to-Japanese Translation

Real-time translation of English speech into Japanese text

CER of 69.1 on CoVoST2 (En->Ja)

model	CoVoST2 (Ja->En)	Fleurs (Ja->En)
kotoba-tech/kotoba-whisper-bilingual-v1.0	73.9	98.7
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-3.3B)	64.3	67.1
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-1.3B)	65.4	68.9
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-distilled-1.3B)	65.6	67.4
japanese-asr/ja-cascaded-s2t-translation (facebook/nllb-200-distilled-600M)	68.2	72.2
openai/whisper-large-v3	71	86.1
openai/whisper-large-v2	66.4	78.8
openai/whisper-large	66.5	86.1
openai/whisper-medium	70.3	97.2
openai/whisper-small	97.3	132.2
openai/whisper-base	186.2	349.6
openai/whisper-tiny	377.2	474

model	CoVoST2 (En->Ja)	Fleurs (En->JA)
kotoba-tech/kotoba-whisper-bilingual-v1.0	69.1	74.4
japanese-asr/en-cascaded-s2t-translation (facebook/nllb-200-3.3B)	62.4	63.5
japanese-asr/en-cascaded-s2t-translation (facebook/nllb-200-1.3B)	64.4	67.2
japanese-asr/en-cascaded-s2t-translation (facebook/nllb-200-distilled-1.3B)	62.4	62.9
japanese-asr/en-cascaded-s2t-translation (facebook/nllb-200-distilled-600M)	63.4	66.2
openai/whisper-large-v3	178.9	209.5
openai/whisper-large-v2	179.6	201.8
openai/whisper-large	178.7	201.8

Property	Details
Model Type	Distilled Whisper models
Training Data	[Japanese ASR: japanese-asr/en_asr.mls, japanese-asr/ja_asr.reazon_speech_all]

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Kotoba Whisper Bilingual V1.0

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Kotoba-Whisper-Bilingual (v1.0)

✨ Features

📚 Documentation

Evaluation

Speech2Text Translation (Japanese->English): WER (smaller is better)

Speech2Text Translation (English->Japanese): CER (smaller is better)

🔧 Technical Details

📄 License