whisper-large-v3-turbo-russian开源模型 - 精准实现俄语自动语音识别

首页

Whisper Large V3 Turbo Russian

由 dvislobokov 开发

基于OpenAI Whisper Large V3 Turbo优化的俄语自动语音识别(ASR)模型，使用Mozilla Common Voice 17俄语数据集微调

语音识别

Transformers

其他开源协议:MIT #俄语语音识别 #高精度转写 #实时音频处理

下载量 1,022

发布时间 : 12/17/2024

模型简介

该模型专门针对俄语语音识别任务优化，能够高效准确地将俄语语音转换为文本，适用于通话记录转录等多种场景。

模型特点

高效俄语识别

专门针对俄语优化的语音识别模型，在俄语识别任务上表现优异

大规模训练数据

使用Mozilla Common Voice 17数据集的11.8万条俄语样本进行训练

高性能硬件支持

支持GPU加速，训练时使用两块A100 40GB显卡

模型能力

俄语语音识别

实时语音转文本

支持麦克风和文件输入

使用案例

语音转录

通话记录转录

将俄语通话录音自动转换为文本

高准确率的转录结果

语音笔记转换

将俄语语音笔记转换为可编辑文本

🚀 俄语语音识别模型

本项目是一个用于自动语音识别的模型，基于openai/whisper-large-v3-turbo基础模型，在俄语数据集上进行训练，能够准确地将俄语语音转换为文本。

🚀 快速开始

本模型使用transformers库进行开发，以下是使用示例：

from transformers import pipeline
import gradio as gr
import time

pipe = pipeline(
    model="dvislobokov/whisper-large-v3-turbo-russian",
    tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
    task='automatic-speech-recognition',
    device='cpu'
)

def transcribe(audio):
    start = time.time()
    text = pipe(audio, return_timestamps=True)['text']
    print(time.time() - start)
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
    outputs='text'
)

iface.launch(share=True)

✨ 主要特性

训练资源：本模型使用两块A100 40GB GPU、128GB内存和两颗至强48核2.4GHz CPU进行训练。
训练时间：约7小时。
训练数据集：使用了来自Mozilla Common Voice 17的11.8万个音频样本。

📦 安装指南

由于文档未提供具体安装命令，此章节跳过。

💻 使用示例

基础用法

from transformers import pipeline
import gradio as gr
import time

pipe = pipeline(
    model="dvislobokov/whisper-large-v3-turbo-russian",
    tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
    task='automatic-speech-recognition',
    device='cpu'
)

def transcribe(audio):
    start = time.time()
    text = pipe(audio, return_timestamps=True)['text']
    print(time.time() - start)
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
    outputs='text'
)

iface.launch(share=True)

高级用法

文档未提供高级用法示例，此部分内容暂缺。

📚 详细文档

由于文档未提供详细说明，此章节跳过。

🔧 技术细节

由于文档中关于技术细节的描述未超过50字，此章节跳过。

📄 许可证

本项目采用MIT许可证。

📋 模型信息

属性	详情
模型类型	自动语音识别模型
训练数据	mozilla-foundation/common_voice_17_0
基础模型	openai/whisper-large-v3-turbo
评估指标	准确率
库名称	transformers
标签	语音通话