マレーシアン・ホイスパー・ベース（Malaysian-Whisper-Base）オープンソース音声認識モデル

ホーム

Malaysian Whisper Base

mesoliticaによって開発

マレーシアデータセットでファインチューンされたWhisperベースモデル、マレー語と英語の音声認識をサポート

音声認識

Transformers

複数言語対応#マレー語音声認識 #多方言サポート #英語-マレー語バイリンガル

ダウンロード数 143

リリース時間 : 1/1/2024

モデル概要

このモデルはWhisperアーキテクチャに基づく音声認識モデルで、特にマレーシア地域のマレー語と英語に特化してファインチューンされており、マレーシアのアクセントや方言の音声テキスト化タスクに適しています。

モデル特徴

マレーシア言語最適化

特にマレーシア地域のマレー語と英語のアクセントに最適化されており、標準マレー語と方言を含む

多様なトレーニングデータ

IMDA音声テキスト化データセット、マレーシアYouTube動画の疑似アノテーションデータセットなど、様々なデータソースを使用してトレーニング

バイリンガルサポート

マレー語と英語の音声認識を同時にサポート、マレーシア式英語も含む

タイムスタンプサポート

タイムスタンプ付きの転記結果を生成可能

モデル能力

マレー語音声認識

英語音声認識

タイムスタンプ付き転記

マレーシアアクセント認識

使用事例

音声転記

会議議事録

マレーシア地域の会議録音を自動的にテキストに転記

マレーシアアクセントのマレー語と英語を正確に認識

メディアコンテンツ字幕生成

マレーシアのYouTube動画に自動的に字幕を生成

方言や現地のアクセントの認識をサポート

音声分析

音声データ分析

マレーシア地域の音声データを分析してインサイトを取得

マレーシア特有の言語バリエーションを処理可能

🚀 マレーシア語ファインチューニング済みWhisper Base

マレーシア語のデータセットでWhisper Baseをファインチューニングしました。使用したデータセットは以下の通りです。

IMDA STT, https://huggingface.co/datasets/mesolitica/IMDA-STT
疑似ラベル付きマレーシア語YouTube動画, https://huggingface.co/datasets/mesolitica/pseudolabel-malaysian-youtube-whisper-large-v3
マレー語会話音声コーパス, https://huggingface.co/datasets/malaysia-ai/malay-conversational-speech-corpus
Haqkiem TTSデータセット（これは非公開ですが、https://www.linkedin.com/in/haqkiem-daim/ からアクセスをリクエストできます）
疑似ラベル付きヌサンタラオーディオブック, https://huggingface.co/datasets/mesolitica/nusantara-audiobook

スクリプトはこちらにあります: https://github.com/mesolitica/malaya-speech/tree/malaysian-speech/session/whisper

Wandbはこちら: https://wandb.ai/huseinzol05/malaysian-whisper-base?workspace=user-huseinzol05

Wandbレポートはこちら: https://wandb.ai/huseinzol05/malaysian-whisper-base/reports/Finetune-Whisper--Vmlldzo2Mzg2NDgx

📚 ファインチューニングに使用した言語

ms：マレー語（標準マレー語と現地のマレー語を含む）
en：英語（標準英語とマングリッシュを含む）

🚀 クイックスタート

基本的な使用法

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, pipeline
from datasets import Audio
import requests

sr = 16000
audio = Audio(sampling_rate=sr)

processor = AutoProcessor.from_pretrained("mesolitica/malaysian-whisper-base")
model = AutoModelForSpeechSeq2Seq.from_pretrained("mesolitica/malaysian-whisper-base")

r = requests.get('https://huggingface.co/datasets/huseinzol05/malaya-speech-stt-test-set/resolve/main/test.mp3')
y = audio.decode_example(audio.encode_example(r.content))['array']
inputs = processor([y], return_tensors = 'pt')
r = model.generate(inputs['input_features'], language='ms', return_timestamps=True)
processor.tokenizer.decode(r[0])

実行結果の例:

'<|startoftranscript|><|ms|><|transcribe|> Zamily On Aging di Vener Australia, Australia yang telah diadakan pada tahun 1982 dan berasaskan unjuran tersebut maka jabatan perangkaan Malaysia menganggarkan menjelang tahun 2005 sejumlah 15% penduduk kita adalah daripada kalangan warga emas. Untuk makluman Tuan Yang Pertua dan juga Alian Bohon, pembangunan sistem pendafiran warga emas ataupun kita sebutkan event adalah usaha kerajaan ke arah merealisasikan objektif yang telah digangkatkan<|endoftext|>'

高度な使用法

英語での予測例:

r = model.generate(inputs['input_features'], language='en', return_timestamps=True)
processor.tokenizer.decode(r[0])

実行結果の例:

<|startoftranscript|><|en|><|transcribe|> Assembly on Aging, Divina Australia, Australia, which has been provided in 1982 and the operation of the transportation of Malaysia's implementation to prevent the tourism of the 25th, 15% of our population is from the market. For the information of the President and also the respected, the development of the market system or we have made an event.<|endoftext|>