whisper-small-japanese开源日语语音识别模型 - 免费实现日语语音转文本

首页

Whisper Small Japanese

由 Ivydata 开发

该模型是基于openai/whisper-small微调的日语语音识别模型，支持日语语音转文本任务。

语音识别

Transformers

日语开源协议:Apache-2.0 #日语语音识别 #低CER #多数据集训练

下载量 356

发布时间 : 5/19/2023

模型简介

使用通用语音、JVS和JSUT数据集对openai/whisper-small进行日语微调，适用于日语语音识别任务。

模型特点

日语优化

专门针对日语语音进行微调，识别效果优于通用模型

多数据集训练

结合通用语音、JVS和JSUT多个日语数据集进行训练

16kHz采样率支持

支持16kHz采样率的语音输入

模型能力

日语语音识别

语音转文本

使用案例

语音转录

日语会议记录

将日语会议录音转换为文字记录

日语字幕生成

为日语视频内容自动生成字幕

🚀 微调日语Whisper语音识别模型

本项目是基于openai/whisper-small模型，使用Common Voice、JVS和JSUT数据集对日语进行微调的语音识别模型。使用该模型时，请确保语音输入的采样率为16kHz。

🚀 快速开始

本模型可直接按如下方式使用：

💻 使用示例

基础用法

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from datasets import load_dataset
import librosa
import torch

LANG_ID = "ja"
MODEL_ID = "Ivydata/whisper-small-japanese"
SAMPLES = 10

test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
processor = WhisperProcessor.from_pretrained(MODEL_ID)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_ID)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="ja", task="transcribe"
)
model.config.suppress_tokens = []

# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
    batch["speech"] = speech_array
    batch["sentence"] = batch["sentence"].upper()
    batch["sampling_rate"] = sampling_rate
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
sample = test_dataset[0]
input_features = processor(sample["speech"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
predicted_ids = model.generate(input_features)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
# ['<|startoftranscript|><|ja|><|transcribe|><|notimestamps|>木村さんに電話を貸してもらいました。<|endoftext|>']

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
# ['木村さんに電話を貸してもらいました。']