The open-source wav2vec2-large-xlsr-korean model - Precise implementation of Korean Automatic Speech Recognition

Wav2vec2 Large Xlsr Korean

Developed by kresnik

Korean Automatic Speech Recognition (ASR) model based on Wav2Vec2 XLSR architecture, excelling on the Zeroth Korean dataset

Speech Recognition

Transformers

KoreanOpen Source License:Apache-2.0 #Korean Speech Recognition #Low Word Error Rate #High Accuracy ASR

Downloads 1.7M

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Korean speech recognition tasks, capable of converting Korean speech to text with high accuracy and low error rates

Model Features

High Accuracy

Achieves a word error rate (WER) of 4.74% and a character error rate (CER) of 1.78% on the Zeroth Korean test set

Large Model Architecture

Based on the large-scale Wav2Vec2 XLSR architecture, suitable for Korean speech recognition tasks

Pre-trained Model

Provides pre-trained model weights ready for inference or fine-tuning

Model Capabilities

Korean Speech Recognition

Audio to Text

Automatic Speech Transcription

Use Cases

Speech Transcription

Korean Meeting Minutes

Automatically converts Korean meeting recordings into text transcripts

Accuracy up to 95.26% (WER 4.74%)

Voice Assistant

Speech recognition module for Korean voice assistant applications

Education

Korean Learning App

Helps Korean learners check pronunciation accuracy

🚀 Wav2Vec2 XLSR Korean

This is a speech model for automatic speech recognition on the Korean language, leveraging the Wav2Vec2 XLSR architecture.

🚀 Quick Start

You can evaluate the model on the Zeroth - Korean ASR corpus using the following steps.

💻 Usage Examples

Basic Usage

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
from datasets import load_dataset
import soundfile as sf
import torch
from jiwer import wer

processor = Wav2Vec2Processor.from_pretrained("kresnik/wav2vec2-large-xlsr-korean")

model = Wav2Vec2ForCTC.from_pretrained("kresnik/wav2vec2-large-xlsr-korean").to('cuda')

ds = load_dataset("kresnik/zeroth_korean", "clean")

test_ds = ds['test']

def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch

test_ds = test_ds.map(map_to_array)

def map_to_pred(batch):
    inputs = processor(batch["speech"], sampling_rate=16000, return_tensors="pt", padding="longest")
    input_values = inputs.input_values.to("cuda")
    
    with torch.no_grad():
        logits = model(input_values).logits

    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)
    batch["transcription"] = transcription
    return batch

result = test_ds.map(map_to_pred, batched=True, batch_size=16, remove_columns=["speech"])

print("WER:", wer(result["text"], result["transcription"]))

Expected Metrics

Expected WER: 4.74%
Expected CER: 1.78%

📚 Documentation

Evaluation on Zeroth - Korean ASR corpus

You can find a Google Colab notebook (in Korean) for this evaluation here.

📄 License

This project is licensed under the Apache-2.0 license.

📦 Model Information

Property	Details
Model Name	Wav2Vec2 XLSR Korean
Datasets	kresnik/zeroth_korean
Tags	speech, audio, automatic - speech - recognition
License	Apache-2.0

📊 Model Results

Task	Dataset	Metrics
Automatic Speech Recognition	Zeroth Korean (kresnik/zeroth_korean, clean)	Test WER: 4.74%, Test CER: 1.78%

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご