teochew - whisper - medium開源潮汕話語音識別模型，精準識別閩南語系潮汕方言

首頁

Teochew Whisper Medium

由efficient-nlp開發

基於Whisper中型模型微調的潮汕話(潮州話)語音識別模型，專門用於識別中國南方閩南語系的潮汕方言。

語音識別

Transformers

開源協議:MIT #潮汕話識別 #方言語音轉寫 #短音頻處理

下載量 194

發布時間 : 1/26/2024

模型概述

該模型是針對潮汕話優化的自動語音識別(ASR)系統，適用於潮汕話的語音轉文字任務。

模型特點

方言優化

專門針對潮汕話進行微調，相比通用語音模型在方言識別上有更好表現

中等規模

基於Whisper中型模型，在準確率和計算資源需求間取得平衡

有限時長處理

最適合處理10秒以內的短音頻片段

模型能力

潮汕話語音識別

語音轉文字

方言處理

使用案例

媒體處理

影視劇字幕生成

為潮汕話影視作品自動生成字幕

在清晰發音場景下WER為0.31

喜劇節目轉錄

轉錄潮汕話喜劇節目內容

日常對話場景WER為0.68

語言研究

方言語音存檔

將潮汕話口語資料轉為文字存檔

🚀 潮汕語Whisper Medium模型

本模型是Whisper medium模型的微調版本，用於識別潮汕話（潮州話），這是一種在中國南方使用的閩南語系語言。該模型能夠有效助力潮汕話的語音識別場景，為潮汕話相關的語音處理提供了有力支持。

🚀 快速開始

示例代碼

以下腳本可用於下載模型，並使用Gradio啟動一個運行該模型的演示：

import torch
import torchaudio
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import gradio as gr

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
WHISPER_SAMPLE_RATE = 16000

processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
model = WhisperForConditionalGeneration.from_pretrained(
    "efficient-nlp/teochew-whisper-medium"
).to(DEVICE)


def preprocess_audio(audio_path: str) -> torch.Tensor:
    audio, sample_rate = torchaudio.load(audio_path)
    # Resample if necessary
    if sample_rate != WHISPER_SAMPLE_RATE:
        resampler = torchaudio.transforms.Resample(
            orig_freq=sample_rate, new_freq=WHISPER_SAMPLE_RATE
        )
        audio = resampler(audio)
    # Convert to mono
    if audio.shape[0] > 1:
        audio = torch.mean(audio, dim=0)
    return audio.squeeze()


def transcribe(audio_path: str) -> str:
    audio_input = preprocess_audio(audio_path)
    input_features = processor(
        audio_input,
        sampling_rate=WHISPER_SAMPLE_RATE,
        return_tensors="pt",
        language="Chinese",
    ).input_features.to(DEVICE)

    predicted_ids = model.generate(input_features)
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
    return transcription


iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(type="filepath"),
    outputs="text",
    title="Teochew Speech Recognition",
)
iface.launch()