Hubert-base-korean Open-source Speech Model - Free Feature Learning from Raw Waveforms to Assist Speech Processing

Hubert Base Korean

Developed by team-lucid

Hubert (Hidden-Unit BERT) is a speech representation learning model proposed by Facebook, which uses self-supervised learning to directly learn speech features from raw waveform signals.

Speech Recognition KoreanOpen Source License:Apache-2.0 #Korean Speech Recognition #Self-supervised Learning #TPU Training

Downloads 54

Release Time : 5/29/2023

Model Overview

This is a Korean speech recognition model based on the Hubert architecture, extracting features from raw audio through self-supervised learning, suitable for Korean speech processing tasks.

Model Features

Self-supervised Learning

Learns speech features directly from raw waveform signals without requiring large amounts of labeled data

Korean Optimization

Specifically trained and optimized for Korean speech data

TPU Training

Trained using Cloud TPUs provided by Google's TPU Research Cloud (TRC)

Model Capabilities

Korean Speech Recognition

Speech Feature Extraction

Audio Signal Processing

Use Cases

Speech Recognition

Korean Speech-to-Text

Converts Korean speech into text content

Speech Processing

Speech Feature Analysis

Extracts feature representations of speech signals for downstream tasks

🚀 hubert-base-korean

Hubert-base-korean is a speech representation learning model designed for Korean automatic speech recognition. It uses self - supervised learning to train directly on raw audio waveforms, offering a new approach to speech recognition.

🚀 Quick Start

💻 Usage Examples

Basic Usage

# Pytorch example
import torch
from transformers import HubertModel

model = HubertModel.from_pretrained("team-lucid/hubert-base-korean")

wav = torch.ones(1, 16000)
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

Advanced Usage

# JAX/Flax example
import jax.numpy as jnp
from transformers import FlaxAutoModel

model = FlaxAutoModel.from_pretrained("team-lucid/hubert-base-korean", trust_remote_code=True)

wav = jnp.ones((1, 16000))
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

✨ Features

Hubert (Hidden-Unit BERT) is a speech representation learning model proposed by Facebook. Unlike traditional speech recognition models, Hubert uses a self - supervised learning approach that directly learns from the raw waveform of speech signals. This research was trained on a Cloud TPU supported by Google's TPU Research Cloud (TRC).

Model Description

Property	Base	Large
CNN Encoder - Strides	5, 2, 2, 2, 2, 2, 2	5, 2, 2, 2, 2, 2, 2
CNN Encoder - Kernel Width	10, 3, 3, 3, 3, 2, 2	10, 3, 3, 3, 3, 2, 2
CNN Encoder - Channel	512	512
Transformer Encoder - Layer	12	24
Transformer Encoder - Embedding Dim	768	1024
Transformer Encoder - Inner FFN Dim	3072	4096
Transformer Encoder - Attention Heads	8	16
Projection - Dim	256	768
Params	95M	317M

🔧 Technical Details

Training Data

This model was trained on approximately 4,000 hours of data extracted from Free Conversation Speech (General Male and Female), Multi - Speaker Speech Synthesis Data, and Broadcast Content Dialogue - Style Speech Recognition Data, which were constructed with the support of the Institute for Information & communications Technology Planning & Evaluation under the Ministry of Science and ICT.

Training Procedure

Similar to the original paper, the Base model was first trained based on MFCC. Then, k - means clustering with 500 clusters was performed, and both the Base and Large models were retrained.

Training Hyperparameters

Hyperparameter	Base	Large
Warmup Steps	32,000	32,000
Learning Rates	5e - 4	1.5e - 3
Batch Size	128	128
Weight Decay	0.01	0.01
Max Steps	400,000	400,000
Learning Rate Decay	0.1	0.1
\(Adam\beta_1\)	0.9	0.9
\(Adam\beta_2\)	0.99	0.99

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご