viwav2vec2-base-3k Open Source Vietnamese Speech Model - Empowering Vietnamese Speech Recognition Tasks

Viwav2vec2 Base 3k

Developed by dragonSwing

This model is a Wav2Vec2 base model pre-trained on 3,000 hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, and requires fine-tuning on downstream tasks for use.

Speech Recognition

Transformers

Other#Vietnamese speech recognition #16kHz audio adaptation #Self-supervised pre-training

Downloads 41

Release Time : 5/3/2022

Model Overview

This is a Wav2Vec2 base model pre-trained on 3,000 hours of Vietnamese speech data, including spontaneous conversations, read speech, and broadcast audio. The model requires fine-tuning on downstream tasks (such as Vietnamese automatic speech recognition) to achieve optimal performance.

Model Features

Large-scale Vietnamese pre-training

Pre-trained on 3,000 hours of Vietnamese speech data, covering various speech types

16kHz sampling rate support

Optimized for 16kHz sampled speech data; ensure input data matches this sampling rate

Requires downstream fine-tuning

The model needs fine-tuning on downstream tasks (e.g., speech recognition) to achieve optimal performance

Model Capabilities

Vietnamese speech feature extraction

Speech representation learning

Use Cases

Speech technology

Vietnamese speech recognition system

Build a Vietnamese automatic speech recognition system by fine-tuning the model

Speech analysis applications

Used for Vietnamese speech content analysis

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Viwav2vec2 Base 3k

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Wav2Vec2 base model trained of 3K hours of Vietnamese speech

🚀 Quick Start

💻 Usage Examples

Basic Usage

Advanced Usage

📄 License

🔗 References