whisper-large-v3-turbo-russian開源模型 - 精準實現俄語自動語音識別

首頁

Whisper Large V3 Turbo Russian

由dvislobokov開發

基於OpenAI Whisper Large V3 Turbo優化的俄語自動語音識別(ASR)模型，使用Mozilla Common Voice 17俄語數據集微調

語音識別

Transformers

其他開源協議:MIT #俄語語音識別 #高精度轉寫 #即時音頻處理

下載量 1,022

發布時間 : 12/17/2024

模型概述

該模型專門針對俄語語音識別任務優化，能夠高效準確地將俄語語音轉換為文本，適用於通話記錄轉錄等多種場景。

模型特點

高效俄語識別

專門針對俄語優化的語音識別模型，在俄語識別任務上表現優異

大規模訓練數據

使用Mozilla Common Voice 17數據集的11.8萬條俄語樣本進行訓練

高性能硬件支持

支持GPU加速，訓練時使用兩塊A100 40GB顯卡

模型能力

俄語語音識別

即時語音轉文本

支持麥克風和文件輸入

使用案例

語音轉錄

通話記錄轉錄

將俄語通話錄音自動轉換為文本

高準確率的轉錄結果

語音筆記轉換

將俄語語音筆記轉換為可編輯文本

🚀 俄語語音識別模型

本項目是一個用於自動語音識別的模型，基於openai/whisper-large-v3-turbo基礎模型，在俄語數據集上進行訓練，能夠準確地將俄語語音轉換為文本。

🚀 快速開始

本模型使用transformers庫進行開發，以下是使用示例：

from transformers import pipeline
import gradio as gr
import time

pipe = pipeline(
    model="dvislobokov/whisper-large-v3-turbo-russian",
    tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
    task='automatic-speech-recognition',
    device='cpu'
)

def transcribe(audio):
    start = time.time()
    text = pipe(audio, return_timestamps=True)['text']
    print(time.time() - start)
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
    outputs='text'
)

iface.launch(share=True)

✨ 主要特性

訓練資源：本模型使用兩塊A100 40GB GPU、128GB內存和兩顆至強48核2.4GHz CPU進行訓練。
訓練時間：約7小時。
訓練數據集：使用了來自Mozilla Common Voice 17的11.8萬個音頻樣本。

📦 安裝指南

由於文檔未提供具體安裝命令，此章節跳過。

💻 使用示例

基礎用法

from transformers import pipeline
import gradio as gr
import time

pipe = pipeline(
    model="dvislobokov/whisper-large-v3-turbo-russian",
    tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
    task='automatic-speech-recognition',
    device='cpu'
)

def transcribe(audio):
    start = time.time()
    text = pipe(audio, return_timestamps=True)['text']
    print(time.time() - start)
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
    outputs='text'
)

iface.launch(share=True)

高級用法

文檔未提供高級用法示例，此部分內容暫缺。

📚 詳細文檔

由於文檔未提供詳細說明，此章節跳過。

🔧 技術細節

由於文檔中關於技術細節的描述未超過50字，此章節跳過。

📄 許可證

本項目採用MIT許可證。

📋 模型信息

屬性	詳情
模型類型	自動語音識別模型
訓練數據	mozilla-foundation/common_voice_17_0
基礎模型	openai/whisper-large-v3-turbo
評估指標	準確率
庫名稱	transformers
標籤	語音通話