whisper-small-korean-pronunciation-scorer Open source model - Freely evaluate Korean pronunciation and score it from 1 to 5

Whisper Small Korean Pronunciation Scorer Sampledata

Developed by tdns03

A Korean pronunciation quality evaluation model fine-tuned based on Whisper-small, capable of scoring Korean pronunciation on a 1-5 scale

Speech Recognition

Transformers

KoreanOpen Source License:Apache-2.0 #Korean pronunciation evaluation #Speech feature analysis #AI-Hub dataset

Downloads 39

Release Time : 7/23/2024

Model Overview

This model is used to evaluate the pronunciation quality of non-native Korean speakers, based on the Whisper architecture and fine-tuned for pronunciation scoring tasks

Model Features

Precise pronunciation evaluation

Adopts a 1-5 point scale for detailed pronunciation accuracy assessment

Advantages of Whisper architecture

Utilizes Whisper's powerful speech feature extraction capabilities

Professional data training

Fine-tuned based on Korea's AI-Hub professional pronunciation evaluation dataset

Model Capabilities

Korean pronunciation evaluation

Speech quality scoring

Pronunciation error detection

Use Cases

Language education

Korean learning assistance

Helps Korean learners evaluate and improve pronunciation

Provides quantitative scoring feedback

Online language testing

Used for pronunciation evaluation in online Korean proficiency tests

🚀 Whisper Fine-tuned Pronunciation Scorer

This model is designed to assess the pronunciation quality of Korean speech. It's built upon the openai/whisper-small model and fine - tuned with the Korea AI - Hub dataset.

✨ Features

Pronunciation Assessment: This model assesses the pronunciation quality of Korean speech, providing a score from 1 to 5.
Encoder - Decoder Architecture: It uses the encoder - decoder architecture of the Whisper model to extract speech features and an additional linear layer to predict the pronunciation score.

📦 Installation

To use this model, you need to install the required libraries. Although the original text doesn't provide specific installation commands, typically, you would install libraries like torch, torchaudio, and transformers using pip or conda. For example:

pip install torch torchaudio transformers

💻 Usage Examples

Basic Usage

import torch
import torchaudio
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch.nn as nn

class WhisperPronunciationScorer(nn.Module):
    def __init__(self, pretrained_model):
        super().__init__()
        self.whisper = pretrained_model
        self.score_head = nn.Linear(self.whisper.config.d_model, 1)

    def forward(self, input_features, labels=None):
        outputs = self.whisper(input_features, labels=labels, output_hidden_states=True)
        last_hidden_state = outputs.decoder_hidden_states[-1]
        scores = self.score_head(last_hidden_state.mean(dim=1)).squeeze()
        return scores

def load_model(model_path, device):
    model_name = "openai/whisper-small"
    processor = WhisperProcessor.from_pretrained(model_name)
    pretrained_model = WhisperForConditionalGeneration.from_pretrained(model_name)
    model = WhisperPronunciationScorer(pretrained_model).to(device)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()
    return model, processor

def predict_pronunciation_score(model, processor, audio_path, transcript, device):
    # Load and preprocess audio
    audio, sr = torchaudio.load(audio_path)
    if sr != 16000:
        audio = torchaudio.functional.resample(audio, sr, 16000)
    input_features = processor(audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features.to(device)
    
    # Prepare transcript
    labels = processor(text=transcript, return_tensors="pt").input_ids.to(device)
    
    # Predict score
    with torch.no_grad():
        score = model(input_features, labels)
    return score.item()

# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_path = "path/to/your/model.pth"
model, processor = load_model(model_path, device)

# Run prediction
audio_path = "path/to/your/audio.wav"
transcript = "안녕하세요"
score = predict_pronunciation_score(model, processor, audio_path, transcript, device)
print(f"Predicted pronunciation score: {score:.2f}")

📚 Documentation

Model Description

The Pronunciation Scorer takes audio input along with its corresponding text transcript and provides a Korean pronunciation score on a scale of 1 to 5. It utilizes the encoder - decoder architecture of the Whisper model to extract speech features and employs an additional linear layer to predict the pronunciation score.

How to Use

To use this model, follow these steps:

Install required libraries
Load the model and processor
Prepare your audio file and text transcript
Predict the pronunciation score

📄 License

This model is released under the Apache 2.0 license.

Additional Information

Property	Details
Model Type	Whisper Fine - tuned Pronunciation Scorer
Training Data	Korea AI - Hub (https://www.aihub.or.kr/) foreigner Korean pronunciation evaluation dataset
Metrics	1~5
Pipeline Tag	audio - classification

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご