🚀 波斯語語音識別模型 - whisper-persian-turbooo
本項目是一個用於自動語音識別的模型,基於openai/whisper-large-v3-turbo
微調而來,可處理波斯語語音,適用於醫療等領域。
🚀 快速開始
模型使用環境
- 數據集:
mozilla-foundation/common_voice_11_0
- 評估指標:
wer
(詞錯誤率)
- 基礎模型:
openai/whisper-large-v3-turbo
- 庫名稱:
transformers
- 標籤:
medical
訓練信息
屬性 |
詳情 |
訓練損失 |
0.013100 |
驗證損失 |
0.043175 |
訓練輪數 |
1 |
許可證
本項目採用 MIT 許可證。
📦 安裝指南
在 Colab 中使用該模型,需要安裝必要的包:
!pip install torch torchaudio transformers pydub google-colab
💻 使用示例
基礎用法
以下是在 Colab 中使用該模型進行波斯語語音轉錄的完整代碼:
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from pydub import AudioSegment
import os
from google.colab import files
model_id = "hackergeek98/whisper-persian-turbooo"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)
whisper_pipe = pipeline(
"automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=0 if torch.cuda.is_available() else -1
)
def convert_to_wav(audio_path):
audio = AudioSegment.from_file(audio_path)
wav_path = "converted_audio.wav"
audio.export(wav_path, format="wav")
return wav_path
def split_audio(audio_path, chunk_length_ms=30000):
audio = AudioSegment.from_wav(audio_path)
chunks = [audio[i:i+chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)]
chunk_paths = []
for i, chunk in enumerate(chunks):
chunk_path = f"chunk_{i}.wav"
chunk.export(chunk_path, format="wav")
chunk_paths.append(chunk_path)
return chunk_paths
def transcribe_long_audio(audio_path):
wav_path = convert_to_wav(audio_path)
chunk_paths = split_audio(wav_path)
transcription = ""
for chunk in chunk_paths:
result = whisper_pipe(chunk)
transcription += result["text"] + "\n"
os.remove(chunk)
os.remove(wav_path)
text_path = "transcription.txt"
with open(text_path, "w") as f:
f.write(transcription)
return text_path
uploaded = files.upload()
audio_file = list(uploaded.keys())[0]
transcription_file = transcribe_long_audio(audio_file)
files.download(transcription_file)
代碼說明
上述代碼實現了在 Colab 中上傳音頻文件,將其轉換為 WAV 格式,分割長音頻為小塊,使用模型進行轉錄,並最終下載轉錄結果的功能。你可以根據實際需求調整代碼中的參數,如音頻分割的時長等。