Japanese - Hubert - base開源日語語音模型 - 依託大量語料支持日語語音處理

首頁

Japanese Hubert Base

由rinna開發

由rinna株式會社訓練的日語HuBERT基礎模型，基於約19,000小時的日語語音語料庫ReazonSpeech v1訓練。

語音識別

Transformers

日語開源協議:Apache-2.0 #日語語音特徵提取 #自監督學習 #大規模語音預訓練

下載量 4,550

發布時間 : 4/28/2023

模型概述

這是一個基於HuBERT架構的日語語音表示學習模型，主要用於語音特徵提取和語音相關任務。

模型特點

日語語音優化

專門針對日語語音數據進行訓練和優化

大規模訓練數據

使用約19,000小時的日語語音語料庫ReazonSpeech v1進行訓練

HuBERT架構

採用HuBERT的自監督學習架構，通過隱藏單元掩碼預測進行語音表示學習

模型能力

語音特徵提取

語音表示學習

使用案例

語音處理

語音特徵提取

從日語語音中提取高級特徵表示

語音相關下游任務

可作為語音識別、語音分類等任務的預訓練模型

🚀 `rinna/japanese-hubert-base`

這是由rinna株式會社訓練的日語HuBERT基礎模型，可用於處理日語語音相關任務。

🚀 快速開始

本模型是日語的HuBERT基礎模型，基於Transformer架構，可用於語音處理任務。以下是使用該模型的基本步驟：

import soundfile as sf
from transformers import AutoFeatureExtractor, AutoModel

model_name = "rinna/japanese-hubert-base"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model.eval()

raw_speech_16kHz, sr = sf.read(audio_file)
inputs = feature_extractor(
    raw_speech_16kHz,
    return_tensors="pt",
    sampling_rate=sr,
)
outputs = model(**inputs)

print(f"Input:  {inputs.input_values.size()}")  # [1, #samples]
print(f"Output: {outputs.last_hidden_state.size()}")  # [1, #frames, 768]

你也可以從這裡獲取fairseq的檢查點文件。

✨ 主要特性

模型概述

該模型架構與原始的HuBERT基礎模型相同，包含12個具有12個注意力頭的Transformer層。
模型使用官方倉庫中的代碼進行訓練，詳細的訓練配置可在同一倉庫和原始論文中找到。

訓練情況

該模型在約19,000小時的日語語音語料庫ReazonSpeech v1上進行訓練。
語料庫鏈接：ReazonSpeech

貢獻者

發佈日期

2023年4月28日

📚 詳細文檔

如何引用該模型

@misc{rinna-japanese-hubert-base,
    title = {rinna/japanese-hubert-base},
    author = {Hono, Yukiya and Mitsui, Kentaro and Sawada, Kei},
    url = {https://huggingface.co/rinna/japanese-hubert-base}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

參考文獻

@article{hsu2021hubert,
    author = {Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman},
    journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    title = {HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units},
    year = {2021},
    volume = {29},
    pages = {3451-3460},
    doi = {10.1109/TASLP.2021.3122291}
}