🚀 ru_whisper_small - Val123val
本模型是 openai/whisper-small 在 Sberdevices_golos_10h_crowd 数据集上进行微调后的版本,可作为自动语音识别(ASR)解决方案,尤其适用于俄语语音识别。
✨ 主要特性
- Whisper 是基于 Transformer 的编解码器模型,也称为序列到序列模型,在 680k 小时的标注语音数据上进行训练,其中俄语数据仅有 5k 小时。
- ru_whisper_small 是在 Sberdevices_golos_10h_crowd 数据集上微调的版本,对于开发者来说,可能是一个很有用的 ASR 解决方案,特别是在俄语语音识别方面。如果针对特定业务任务进行微调,还可能展现出额外的能力。
📦 安装指南
文档未提供具体安装步骤,可参考相关库的官方文档进行安装,如 transformers
、datasets
等。
💻 使用示例
基础用法
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset
processor = WhisperProcessor.from_pretrained("Val123val/ru_whisper_small")
model = WhisperForConditionalGeneration.from_pretrained("Val123val/ru_whisper_small")
model.config.forced_decoder_ids = None
ds = load_dataset("bond005/sberdevices_golos_10h_crowd", split="validation", token=True)
sample = ds[0]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
长音频转录
import torch
from transformers import pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
"automatic-speech-recognition",
model="Val123val/ru_whisper_small",
chunk_length_s=30,
device=device,
)
ds = load_dataset("bond005/sberdevices_golos_10h_crowd", split="validation", token=True)
sample = ds[0]["audio"]
prediction = pipe(sample.copy(), batch_size=8)["text"]
prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
基于推测解码的加速使用
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
from transformers import pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
dataset = load_dataset("bond005/sberdevices_golos_10h_crowd", split="validation", token=True)
model_id = "Val123val/ru_whisper_small"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True,
attn_implementation="sdpa",
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
assistant_model_id = "openai/whisper-tiny"
assistant_model = AutoModelForSpeechSeq2Seq.from_pretrained(
assistant_model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True,
attn_implementation="sdpa",
)
assistant_model.to(device);
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=15,
batch_size=4,
generate_kwargs={"assistant_model": assistant_model},
torch_dtype=torch_dtype,
device=device,
)
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])
📚 详细文档
训练超参数
训练过程中使用了以下超参数:
属性 |
详情 |
学习率 |
0.0001 |
训练批次大小 |
32 |
评估批次大小 |
16 |
随机种子 |
42 |
优化器 |
Adam(betas=(0.9, 0.999),epsilon=1e-08) |
学习率调度器类型 |
线性 |
学习率调度器热身步数 |
500 |
训练步数 |
5000 |
框架版本
- Transformers 4.36.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0
📄 许可证
本项目采用 Apache-2.0 许可证。