🚀 俄语语音识别合并模型
本项目是基于TIES方法合并的俄语语音识别模型,使用了多个基础模型和数据集,可用于自动语音识别(ASR)任务。
🚀 快速开始
模型信息
属性 |
详情 |
基础模型 |
antony66/whisper-large-v3-russian、bond005/whisper-large-v3-ru-podlodka |
语言 |
ru(俄语) |
库名称 |
transformers |
标签 |
asr、whisper、russian、mergekit、merge |
数据集 |
mozilla-foundation/common_voice_17_0、bond005/taiga_speech_v2、bond005/podlodka_speech、bond005/rulibrispeech |
评估指标 |
wer(词错误率) |
新版本信息
新版本已发布:Apel-sin/whisper-large-v3-russian-ties-podlodka-v1.2
📚 详细文档
模型合并细节
此模型使用TIES合并方法进行合并,具体配置如下:
method: ties
parameters:
ties_density: 0.85
encoder_weights:
- 0.65
- 0.35
decoder_weights:
- 0.6
- 0.4
models:
model_a: "/mnt/cloud/llm/whisper/whisper-large-v3-russian"
model_b: "/mnt/cloud/llm/whisper/whisper-large-v3-ru-podlodka"
output_dir: "/mnt/cloud/llm/whisper/whisper-large-v3-russian-ties-podlodka"
简单API服务器
该模型可与简单的OpenAI兼容API服务器一起使用:https://github.com/kreolsky/whisper-api-server/
💻 使用示例
基础用法
为了处理电话通话,强烈建议在进行自动语音识别(ASR)之前对录音进行预处理并调整音量。例如,可以使用以下命令:
sox record.wav -r 8000 record-normalized.wav norm -0.5 compand 0.3,1 -90,-90,-70,-50,-40,-15,0,0 -7 0 0.15
然后,ASR代码示例如下:
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
torch_dtype = torch.bfloat16
device = 'cpu'
if torch.cuda.is_available():
device = 'cuda'
elif torch.backends.mps.is_available():
device = 'mps'
setattr(torch.distributed, "is_initialized", lambda : False)
device = torch.device(device)
whisper = WhisperForConditionalGeneration.from_pretrained(
"antony66/whisper-large-v3-russian", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True,
)
processor = WhisperProcessor.from_pretrained("antony66/whisper-large-v3-russian")
asr_pipeline = pipeline(
"automatic-speech-recognition",
model=whisper,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=256,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
from io import BufferIO
wav = BytesIO()
with open('record-normalized.wav', 'rb') as f:
wav.write(f.read())
wav.seek(0)
asr = asr_pipeline(wav, generate_kwargs={"language": "russian", "max_new_tokens": 256}, return_timestamps=False)
print(asr['text'])
🔧 开发进度
目前该模型仍在开发中,目标是尽可能对其进行微调,以用于电话通话的语音识别。如果您想贡献代码,并且知道或拥有任何优质数据集,请告知。非常感谢您的帮助。