whisper-base-japanese开源模型 - 专为日语语音识别任务免费部署使用

首页

Whisper Base Japanese

由 Ivydata 开发

本模型使用Common Voice、JVS和JSUT数据集对openai/whisper-base进行日语微调，适用于日语语音识别任务。

语音识别

Transformers

日语开源协议:Apache-2.0 #日语语音识别 #低错误率 #多数据集训练

下载量 137

发布时间 : 5/17/2023

模型简介

这是一个基于Whisper架构的日语语音识别模型，专门针对日语语音进行了优化，能够将日语语音转换为文本。

模型特点

日语优化

专门针对日语语音特点进行了微调，提高了日语识别的准确性

多数据集训练

使用Common Voice、JVS和JSUT三个日语数据集进行训练，覆盖多种语音场景

16kHz采样率支持

支持16kHz采样率的语音输入，适合大多数语音应用场景

模型能力

日语语音转文本

连续语音识别

通用语音转录

使用案例

语音转录

日语会议记录

将日语会议录音自动转录为文字记录

日语语音助手

为日语语音助手提供语音识别能力

教育

日语学习辅助

帮助日语学习者将口语练习转录为文字

🚀 用于语音识别的微调日语Whisper模型

本项目是基于 openai/whisper-base 模型，使用 Common Voice、JVS 和 JSUT 数据集对日语进行微调后的语音识别模型。使用该模型时，请确保输入的语音采样率为 16kHz。

🚀 快速开始

本模型可直接按以下方式使用：

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from datasets import load_dataset
import librosa
import torch

LANG_ID = "ja"
MODEL_ID = "Ivydata/whisper-base-japanese"
SAMPLES = 10

test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
processor = WhisperProcessor.from_pretrained("openai/whisper-base")
model = WhisperForConditionalGeneration.from_pretrained(MODEL_ID)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="ja", task="transcribe"
)
model.config.suppress_tokens = []

# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
    batch["speech"] = speech_array
    batch["sentence"] = batch["sentence"].upper()
    batch["sampling_rate"] = sampling_rate
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
sample = test_dataset[0]
input_features = processor(sample["speech"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
predicted_ids = model.generate(input_features)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
# ['<|startoftranscript|><|ja|><|transcribe|><|notimestamps|>木村さんに電話を貸してもらいました。<|endoftext|>']

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
# ['木村さんに電話を貸してもらいました。']

💻 使用示例

基础用法

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from datasets import load_dataset
import librosa
import torch

LANG_ID = "ja"
MODEL_ID = "Ivydata/whisper-base-japanese"
SAMPLES = 10

test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
processor = WhisperProcessor.from_pretrained("openai/whisper-base")
model = WhisperForConditionalGeneration.from_pretrained(MODEL_ID)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="ja", task="transcribe"
)
model.config.suppress_tokens = []

# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
    batch["speech"] = speech_array
    batch["sentence"] = batch["sentence"].upper()
    batch["sampling_rate"] = sampling_rate
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
sample = test_dataset[0]
input_features = processor(sample["speech"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
predicted_ids = model.generate(input_features)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
# ['<|startoftranscript|><|ja|><|transcribe|><|notimestamps|>木村さんに電話を貸してもらいました。<|endoftext|>']

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
# ['木村さんに電話を貸してもらいました。']