X

Xphonebert Base

Developed by vinai
XPhoneBERT is the first multilingual phoneme representation pretraining model for text-to-speech (TTS), based on the BERT-base architecture and trained with 330 million phoneme-level sentences across nearly 100 languages.
Downloads 7,561
Release Time : 4/13/2023

Model Overview

XPhoneBERT is a pretrained multilingual phoneme representation model specifically designed for text-to-speech (TTS) tasks, capable of improving the naturalness and prosody of TTS models.

Model Features

Multilingual Support
Supports phoneme representation learning for nearly 100 languages and regions.
Phoneme-Level Pretraining
Pretrained with 330 million phoneme-level sentences to optimize TTS task performance.
TTS Quality Improvement
As an input phoneme encoder, it significantly enhances the naturalness and prosody of TTS models.
Low-Resource Adaptation
Can generate relatively high-quality speech even with limited training data.

Model Capabilities

Phoneme Sequence Encoding
Multilingual Text-to-Phoneme Conversion
Improving TTS Model Performance

Use Cases

Speech Synthesis
High-Quality TTS System
Integrated as a front-end phoneme encoder into TTS systems.
Improves the naturalness and prosody of synthesized speech.
Low-Resource Language TTS
Building TTS systems for languages with limited training data.
Generates relatively high-quality speech output.
Featured Recommended AI Models
ยฉ 2025AIbase