đ Whisper Large V3 Russian Ties Podlodka Model
This is a merged Whisper model for Russian Automatic Speech Recognition (ASR), offering enhanced performance for specific use - cases.
đ Quick Start
The new version of the model is available at Apel - sin/whisper-large-v3-russian-ties-podlodka-v1.2.
⨠Features
- Multi - base model merge: Merged from
antony66/whisper-large-v3-russian
and bond005/whisper-large-v3-ru-podlodka
.
- TIES merge method: Utilizes the TIES merge method for model combination.
- OpenAI compatible API support: Can be used with a simple OpenAI compatible API server.
đĻ Installation
This section doesn't have specific installation commands, so it is skipped.
đģ Usage Examples
Basic Usage
It is highly recommended to pre - process your records and adjust the volume before performing ASR for phone call processing. For example:
sox record.wav -r 8000 record-normalized.wav norm -0.5 compand 0.3,1 -90,-90,-70,-50,-40,-15,0,0 -7 0 0.15
Then, your ASR code should look somewhat like this:
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
torch_dtype = torch.bfloat16
device = 'cpu'
if torch.cuda.is_available():
device = 'cuda'
elif torch.backends.mps.is_available():
device = 'mps'
setattr(torch.distributed, "is_initialized", lambda : False)
device = torch.device(device)
whisper = WhisperForConditionalGeneration.from_pretrained(
"antony66/whisper-large-v3-russian", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True,
)
processor = WhisperProcessor.from_pretrained("antony66/whisper-large-v3-russian")
asr_pipeline = pipeline(
"automatic-speech-recognition",
model=whisper,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=256,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
from io import BufferIO
wav = BytesIO()
with open('record-normalized.wav', 'rb') as f:
wav.write(f.read())
wav.seek(0)
asr = asr_pipeline(wav, generate_kwargs={"language": "russian", "max_new_tokens": 256}, return_timestamps=False)
print(asr['text'])
đ Documentation
Model Details
This model was merged using the TIES merge method.
method: ties
parameters:
ties_density: 0.85
encoder_weights:
- 0.65
- 0.35
decoder_weights:
- 0.6
- 0.4
models:
model_a: "/mnt/cloud/llm/whisper/whisper-large-v3-russian"
model_b: "/mnt/cloud/llm/whisper/whisper-large-v3-ru-podlodka"
output_dir: "/mnt/cloud/llm/whisper/whisper-large-v3-russian-ties-podlodka"
Simple API server
It can be used with a simple OpenAI compatible API server: https://github.com/kreolsky/whisper-api-server/
Work in progress
This model is in WIP state for now. The goal is to finetune it for speech recognition of phone calls as much as possible. If you want to contribute and you know or have any good dataset please let me know. Your help will be much appreciated.
đ§ Technical Details
Property |
Details |
Base Model |
antony66/whisper-large-v3-russian, bond005/whisper-large-v3-ru-podlodka |
Language |
ru |
Library Name |
transformers |
Tags |
asr, whisper, russian, mergekit, merge |
Datasets |
mozilla-foundation/common_voice_17_0, bond005/taiga_speech_v2, bond005/podlodka_speech, bond005/rulibrispeech |
Metrics |
wer |