Hubert-large-korean: An open-source Korean speech recognition model - Free deployment for accurate processing of Korean speech

Hubert Large Korean

Developed by team-lucid

Hubert-large-korean is a Korean automatic speech recognition model based on the Hubert architecture. It extracts features directly from speech waveforms through self-supervised learning and performs excellently in Korean speech processing.

Speech Recognition

Transformers

KoreanOpen Source License:Apache-2.0 #Korean speech recognition #Self-supervised learning #Raw waveform processing

Downloads 131

Release Time : 6/4/2023

Model Overview

This model adopts the Hidden-Unit BERT architecture and is specifically designed for Korean speech recognition tasks. It can learn directly from the original speech signal without relying on traditional feature extraction methods.

Model Features

Self-supervised learning

Learn directly from the raw waveform of the speech signal without manually labeled data

Korean optimization

Specifically trained and optimized for the characteristics of Korean speech

Large-scale training

Trained with approximately 4000 hours of Korean speech data

High-performance architecture

Adopts a 24-layer Transformer encoder, a 1024-dimensional embedding space, and 16 attention heads

Model Capabilities

Korean speech recognition

Speech feature extraction

Speech waveform processing

Use Cases

Speech to text

Korean speech transcription

Convert Korean speech content into text

Speech analysis

Speech feature analysis

Extract high-level feature representations of the speech signal

🚀 hubert-large-korean

Hubert-large-korean is a speech representation learning model designed for automatic speech recognition. It utilizes a self - supervised learning approach to directly learn from raw speech waveforms, offering a new perspective in the field of speech recognition.

🚀 Quick Start

Pytorch

import torch
from transformers import HubertModel

model = HubertModel.from_pretrained("team-lucid/hubert-large-korean")

wav = torch.ones(1, 16000)
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

JAX/Flax

import jax.numpy as jnp
from transformers import FlaxAutoModel

model = FlaxAutoModel.from_pretrained("team-lucid/hubert-large-korean", trust_remote_code=True)

wav = jnp.ones((1, 16000))
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

✨ Features

Hubert (Hidden - Unit BERT) is a speech representation learning model proposed by Facebook. Unlike traditional speech recognition models, Hubert uses a self - supervised learning method that directly learns from the raw waveform of speech signals.

This research was trained on Cloud TPU supported by Google's TPU Research Cloud (TRC).

Model Description

		Base	Large
CNN Encoder	strides	5, 2, 2, 2, 2, 2, 2	5, 2, 2, 2, 2, 2, 2
	kernel width	10, 3, 3, 3, 3, 2, 2	10, 3, 3, 3, 3, 2, 2
	channel	512	512
Transformer Encoder	Layer	12	24
	embedding dim	768	1024
	inner FFN dim	3072	4096
	attention heads	8	16
Projection	dim	256	768
Params		95M	317M

🔧 Technical Details

Training Data

This model was trained on approximately 4,000 hours of data extracted from Free Conversation Speech (General Male and Female), Multi - Speaker Speech Synthesis Data, and Broadcast Content Conversational Speech Recognition Data, which were constructed with the support of the Korea Institute of Information and Communication Technology Promotion under the funding of the Ministry of Science and ICT.

Training Procedure

Similar to the original paper, the Base model was first trained based on MFCC. Then, k - means was performed with 500 clusters, and both the Base and Large models were trained again.

Training Hyperparameters

Hyperparameter	Base	Large
Warmup Steps	32,000	32,000
Learning Rates	5e - 4	1.5e - 3
Batch Size	128	128
Weight Decay	0.01	0.01
Max Steps	400,000	400,000
Learning Rate Decay	0.1	0.1
\(Adam\beta_1\)	0.9	0.9
\(Adam\beta_2\)	0.99	0.99

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご