W

Wav2vec2 Large Xlsr 53 Japanese

Developed by Ivydata
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Downloads 19
Release Time : 5/11/2023

Model Overview

This model is a speech recognition model fine-tuned on the XLSR-53 large model using Japanese datasets including Common Voice, JVS, and JSUT, specifically designed for Japanese speech-to-text tasks.

Model Features

Multi-dataset fine-tuning
Fine-tuned using three Japanese datasets (Common Voice, JVS, and JSUT) to enhance the model's Japanese speech recognition capability
No language model required
Can be used directly without additional language model support
High performance
Achieves CER of 27.87% on TEDxJP-10K dataset, outperforming other Japanese speech recognition models

Model Capabilities

Japanese speech recognition
16kHz audio processing
Real-time speech-to-text

Use Cases

Speech transcription
Japanese meeting minutes
Automatically convert Japanese meeting recordings into text transcripts
Approximately 72.13% accuracy (based on CER metric)
Japanese subtitle generation
Automatically generate subtitles for Japanese video content
Voice assistant
Japanese voice command recognition
Used for voice command recognition in Japanese voice assistants or smart home devices
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase