whisper-large-v3-Telugu-Romanized開源模型 - 免費支持羅馬化泰盧固語語音識別

首頁

Whisper Large V3 Telugu Romanized

由jayasuryajsk開發

基於openai/whisper-large-v3微調的羅馬化泰盧固語語音識別模型

語音識別

Transformers

其他開源協議:Apache-2.0 #泰盧固語羅馬化轉錄 #口語對話識別 #多語言語音轉寫

下載量 18

發布時間 : 5/6/2024

模型概述

該模型專門用於轉錄羅馬化腳本的泰盧固語日常對話，基於Whisper Large V3架構微調

模型特點

羅馬化泰盧固語支持

專門處理使用英文字母書寫的羅馬化泰盧固語

基於Whisper Large V3

利用強大的Whisper Large V3架構進行微調

日常對話優化

針對日常口語對話場景進行優化

模型能力

泰盧固語語音識別

羅馬化文本輸出

長音頻處理

使用案例

語音轉錄

日常對話轉錄

將羅馬化泰盧固語的日常對話轉換為文本

🚀 威仕普大模型V3 - 羅馬化泰盧固語口語識別模型

本模型是基於 openai/whisper-large-v3 在泰盧固語羅馬化1.0數據集上進行微調的版本。它能有效解決泰盧固語口語識別及轉寫問題，為泰盧固語語音處理提供了高效準確的解決方案。

🚀 快速開始

本模型是 openai/whisper-large-v3 在泰盧固語羅馬化1.0數據集上的微調版本。在評估集上取得了以下結果：

評估損失（eval_loss）：1.5009
評估字錯率（eval_wer）：68.1275
評估運行時間（eval_runtime）：591.6137
每秒評估樣本數（eval_samples_per_second）：0.798
每秒評估步數（eval_steps_per_second）：0.1
訓練輪數（epoch）：8.6207
訓練步數（step）：1000

✨ 主要特性

針對性訓練：該模型經過專門訓練，可將泰盧固語對話轉錄為羅馬化文字，這種文字是大多數人日常生活中常用的書寫形式。

📦 安裝指南

文檔未提及安裝步驟，暫不提供。

💻 使用示例

基礎用法

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "jayasuryajsk/whisper-large-v3-Telugu-Romanized"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
result = pipe("recording.mp3", generate_kwargs={"language": "english"})
print(result["text"])