whisper-large-v3-russian-ties-podlodka-v1.2開源模型 - 免費優化電話錄音俄語語音識別

首頁

Whisper Large V3 Russian Ties Podlodka V1.2

由Apel-sin開發

基於TIES融合方法的俄語語音識別模型，整合了兩個Whisper-large-v3俄語變體，針對電話錄音場景優化

語音識別

Transformers

其他#俄語電話錄音識別 #TIES融合模型 #低資源優化

下載量 2,408

發布時間 : 4/2/2025

模型概述

該模型通過TIES融合方法合併了兩個俄語Whisper模型，專注於提高俄語語音識別準確率，特別優化了電話通話場景下的識別性能

模型特點

TIES融合技術

採用先進的TIES模型融合方法，稀疏密度0.9，編碼器/解碼器差異化權重分配(0.8/0.2和0.2/0.8)

電話場景優化

專門針對電話錄音場景優化，建議配合音頻預處理流程使用

多數據集訓練

融合了Common Voice 17.0、Taiga Speech、Podlodka等多個俄語語音數據集

模型能力

俄語語音轉文本

長音頻分塊處理

時間戳生成

低資源設備支持

使用案例

語音轉錄

電話錄音轉寫

將俄語電話通話內容轉換為文字記錄

針對電話音頻優化的識別準確率

會議記錄生成

自動生成俄語會議音頻的文字記錄

支持長音頻分塊處理

🚀 語音識別模型

本項目是一個語音識別模型，通過合併多個基礎模型，使用特定的合併方法和參數，實現對俄語語音的自動識別。該模型可與簡單的OpenAI兼容API服務器結合使用，同時提供了使用示例代碼，方便開發者進行語音識別任務。

🚀 快速開始

本模型使用TIES合併方法進行合併。以下是合併時使用的配置：

method: ties
parameters:
  ties_density: 0.9
  encoder_weights:
    - 0.8
    - 0.2
  decoder_weights:
    - 0.2
    - 0.8
models:
  model_a: "/mnt/cloud/llm/whisper/whisper-large-v3-russian"
  model_b: "/mnt/cloud/llm/whisper/whisper-large-v3-ru-podlodka"
output_dir: "/mnt/cloud/llm/whisper/whisper-large-v3-russian-ties-podlodka"

✨ 主要特性

多基礎模型合併：基於antony66/whisper-large-v3-russian和bond005/whisper-large-v3-ru-podlodka兩個基礎模型進行合併。
支持俄語識別：適用於俄語語音的自動識別任務。
可與API服務器集成：能與簡單的OpenAI兼容API服務器結合使用，如whisper-api-server 。

📦 安裝指南

文檔未提及具體安裝步驟，可參考相關模型庫（如transformers）的官方文檔進行安裝。

💻 使用示例

基礎用法

在處理電話通話語音時，強烈建議在進行自動語音識別（ASR）之前對錄音進行預處理並調整音量。例如，可以使用以下命令：

sox record.wav -r 8000 record-normalized.wav norm -0.5 compand 0.3,1 -90,-90,-70,-50,-40,-15,0,0 -7 0 0.15

然後，使用以下Python代碼進行語音識別：

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline

torch_dtype = torch.bfloat16 # set your preferred type here 

device = 'cpu'
if torch.cuda.is_available():
    device = 'cuda'
elif torch.backends.mps.is_available():
    device = 'mps'
    setattr(torch.distributed, "is_initialized", lambda : False) # monkey patching
device = torch.device(device)

whisper = WhisperForConditionalGeneration.from_pretrained(
    "antony66/whisper-large-v3-russian", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True,
    # add attn_implementation="flash_attention_2" if your GPU supports it
)

processor = WhisperProcessor.from_pretrained("antony66/whisper-large-v3-russian")

asr_pipeline = pipeline(
    "automatic-speech-recognition",
    model=whisper,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=256,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

# read your wav file into variable wav. For example:
from io import BufferIO
wav = BytesIO()
with open('record-normalized.wav', 'rb') as f:
    wav.write(f.read())
wav.seek(0)

# get the transcription
asr = asr_pipeline(wav, generate_kwargs={"language": "russian", "max_new_tokens": 256}, return_timestamps=False)

print(asr['text'])

📚 詳細文檔

模型信息

屬性	詳情
基礎模型	antony66/whisper-large-v3-russian、bond005/whisper-large-v3-ru-podlodka
語言	俄語
庫名稱	transformers
標籤	asr、whisper、russian、mergekit、merge
數據集	mozilla-foundation/common_voice_17_0、bond005/taiga_speech_v2、bond005/podlodka_speech、bond005/rulibrispeech
評估指標	wer