Izanami - wav2vec2 - large Open - source Japanese Model: Empowering Speech Processing with Broadcast Audio Data

Home

Izanami Wav2vec2 Large

Developed by imprt

Japanese wav2vec2.0 Large model pre-trained on large-scale Japanese TV broadcast audio data

Speech Recognition

PyTorch

JapaneseOpen Source License:Other #Japanese speech feature extraction #Large-scale pre-training #TV broadcast audio

Downloads 89

Release Time : 3/7/2025

Model Overview

This is a Japanese speech feature extraction model pre-trained on 62,215 hours of Japanese TV broadcast audio data, using the wav2vec2.0 Large architecture

Model Features

Large-scale pre-training data

Pre-trained using 62,215 hours of Japanese TV broadcast audio data

Japanese language optimization

Specifically optimized for Japanese speech features

wav2vec2.0 architecture

Utilizes the advanced wav2vec2.0 Large architecture

Model Capabilities

Speech feature extraction

Japanese speech processing

Use Cases

Speech processing

Japanese speech feature extraction

Extract high-quality feature representations from Japanese speech

🚀 `imprt/izanami-wav2vec2-large`

This is a Japanese wav2vec 2.0 Large model. It's pre - trained with 62215 hours of audio extracted from large - scale Japanese TV broadcast audio data by voice activity detection, offering high - quality feature extraction for speech tasks.

🚀 Quick Start

This is a Japanese wav2vec 2.0 Large model pre - trained using 62215 hours of audio extracted from large - scale Japanese TV broadcast audio data by voice activity detection. This model was trained using code from the official repository.

💻 Usage Examples

Basic Usage

import soundfile as sf
from transformers import AutoFeatureExtractor
model = "imprt/izanami-wav2vec2-large"
feature_extractor = AutoFeatureExtractor.from_pretrained(model)
audio_file="/path/to/16k_audio_file"
audio_input, sr = sf.read(audio_file)
feature_extractor(audio_input, sampling_rate=sr)

📚 Documentation

References

@inproceedings{NEURIPS2020_92d1e1eb,
    author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
    pages = {12449--12460},
    publisher = {Curran Associates, Inc.},
    title = {wav2vec 2.0: A Framework for Self - Supervised Learning of Speech Representations},
    url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07 - Paper.pdf},
    volume = {33},
    year = {2020}
}

📄 License

Read LICENSE when you use this model.

⚠️ Important Note

Please read LICENSE.md before downloading this model.

Property	Details
Model Type	wav2vec2 for feature - extraction
Training Data	62215 hours of audio extracted from large - scale Japanese TV broadcast audio data by voice activity detection
License	other (imprt - license)
License Link	LICENSE.md
Language	Japanese
Gated Fields	Country (country), Affiliation (text), I agree ALL the statements in LICENSE md (checkbox)
Gated Button Content	Acknowledge license

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご