đ imprt/izanami-wav2vec2-large
This is a Japanese wav2vec 2.0 Large model. It's pre - trained with 62215 hours of audio extracted from large - scale Japanese TV broadcast audio data by voice activity detection, offering high - quality feature extraction for speech tasks.
đ Quick Start
This is a Japanese wav2vec 2.0 Large model pre - trained using 62215 hours of audio extracted from large - scale Japanese TV broadcast audio data by voice activity detection. This model was trained using code from the official repository.
đģ Usage Examples
Basic Usage
import soundfile as sf
from transformers import AutoFeatureExtractor
model = "imprt/izanami-wav2vec2-large"
feature_extractor = AutoFeatureExtractor.from_pretrained(model)
audio_file="/path/to/16k_audio_file"
audio_input, sr = sf.read(audio_file)
feature_extractor(audio_input, sampling_rate=sr)
đ Documentation
References
@inproceedings{NEURIPS2020_92d1e1eb,
author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
pages = {12449--12460},
publisher = {Curran Associates, Inc.},
title = {wav2vec 2.0: A Framework for Self - Supervised Learning of Speech Representations},
url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07 - Paper.pdf},
volume = {33},
year = {2020}
}
đ License
Read LICENSE when you use this model.
â ī¸ Important Note
Please read LICENSE.md before downloading this model.
Property |
Details |
Model Type |
wav2vec2 for feature - extraction |
Training Data |
62215 hours of audio extracted from large - scale Japanese TV broadcast audio data by voice activity detection |
License |
other (imprt - license) |
License Link |
LICENSE.md |
Language |
Japanese |
Gated Fields |
Country (country), Affiliation (text), I agree ALL the statements in LICENSE md (checkbox) |
Gated Button Content |
Acknowledge license |