The mms-spa-finetuned-chilean-monospeaker open-source text-to-speech model accurately simulates Chilean Spanish accents!

Mms Spa Finetuned Chilean Monospeaker

Developed by ylacombe

A lightweight text-to-speech model based on Facebook MMS-TTS Spanish version, specifically fine-tuned for Chilean Spanish accent

Speech Synthesis

Transformers

Spanish#Chilean Spanish TTS #Lightweight Speech Synthesis #Low-sample Fine-tuning

Downloads 595

Release Time : 11/28/2023

Model Overview

A low-latency TTS model using VITS architecture, achieving Chilean Spanish speech synthesis through rapid fine-tuning (20 minutes) with minimal samples (80-150)

Model Features

Rapid Fine-tuning

Only requires 20 minutes of training time and 80-150 samples to adapt to specific accents

Lightweight & Low-latency

Designed with VITS architecture for efficient inference performance

Accent Adaptation

Specifically optimized for Chilean Spanish accent

Model Capabilities

Spanish Text-to-Speech

Chilean Accent Speech Synthesis

Real-time Speech Generation

Use Cases

Voice Interaction

Chilean Dialect Voice Assistant

Provides localized accent voice interaction experience for Chilean users

Sample audio demonstrates natural and fluent Chilean accent synthesis

Content Creation

Audio Content Production

Quickly generates narrations or dubbing with regional characteristics

🚀 Transformers Text-to-Speech Model

This project provides a text-to-speech solution using the finetuned MMS model. It can generate high - quality Spanish speech with low latency, trained on a Chilean Spanish dataset.

✨ Features

Light - weight and Low - latency: Based on the VITS architecture, it offers efficient text - to - speech conversion.
Fast Training: Can be finetuned in around 20 minutes with as few as 80 to 150 samples.
Multi - platform Support: Usable in both Python (Transformers library) and JavaScript (Transformers.js).

📦 Installation

Transformers.js

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @xenova/transformers

💻 Usage Examples

Basic Usage - Python (Transformers)

from transformers import pipeline
import scipy

model_id = "ylacombe/mms-spa-finetuned-chilean-monospeaker"
synthesiser = pipeline("text-to-speech", model_id) # add device=0 if you want to use a GPU

speech = synthesiser("Hola, ¿cómo estás hoy?")

scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"])

Advanced Usage - JavaScript (Transformers.js)

import { pipeline } from '@xenova/transformers';

// Create a text-to-speech pipeline
const synthesizer = await pipeline('text-to-speech', 'ylacombe/mms-spa-finetuned-chilean-monospeaker', {
    quantized: false, // Remove this line to use the quantized version (default)
});

// Generate speech
const output = await synthesizer('Hola, ¿cómo estás hoy?');
console.log(output);
// {
//   audio: Float32Array(69888) [ ... ],
//   sampling_rate: 16000
// }

Optionally, save the audio to a wav file (Node.js):

import wavefile from 'wavefile';
import fs from 'fs';

const wav = new wavefile.WaveFile();
wav.fromScratch(1, output.sampling_rate, '32f', output.audio);
fs.writeFileSync('out.wav', wav.toBuffer());

📚 Documentation

Model

This is a finetuned version of the Spanish version of Massively Multilingual Speech (MMS) models, which are light - weight, low - latency TTS models based on the VITS architecture.

It was trained in around 20 minutes with as little as 80 to 150 samples, on this Chilean Spanish dataset.

Training recipe available in this github repository: ylacombe/finetune - hf - vits.

📄 License

This model is licensed under the cc - by - nc - 4.0 license.

Property	Details
Library Name	transformers
Pipeline Tag	text - to - speech
Model Type	Finetuned MMS model based on VITS architecture
Training Data	ylacombe/google - chilean - spanish
License	cc - by - nc - 4.0
Language	Spanish

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご