M

Metavoice 1B V0.1

Developed by metavoiceio
MetaVoice-1B is a 1.2 billion parameter text-to-speech (TTS) foundation model trained on 100,000 hours of speech data, specializing in generating emotional English speech with support for voice cloning and long-form synthesis.
Downloads 571
Release Time : 2/6/2024

Model Overview

MetaVoice-1B is a foundation model designed for text-to-speech tasks, capable of generating English speech with emotional rhythm and intonation, supporting voice cloning and long-form synthesis.

Model Features

Emotional Speech Generation
Capable of generating English speech with emotional rhythm and intonation, avoiding incoherent content.
Voice Cloning
Supports voice cloning through fine-tuning, requiring only 1 minute of training data for Indian accents and just 30 seconds of reference audio for zero-shot cloning of American and British accents.
Long-form Synthesis
Supports long-form synthesis, with arbitrary-length TTS functionality coming soon.
Efficient Inference
Supports KV caching and batch processing (including texts of varying lengths) via Flash Decoding.

Model Capabilities

Text-to-Speech
Voice Cloning
Long-form Synthesis

Use Cases

Speech Synthesis
Personalized Voice Assistants
Generate personalized voices for voice assistants to enhance user experience.
Produces natural, emotional speech.
Audiobooks
Convert text content into speech for audiobook production.
Supports long-form synthesis, generating high-quality speech.
Voice Cloning
Voice Cloning Services
Clone a specific speaker's voice with minimal samples.
Requires only 1 minute of training data for Indian accents and just 30 seconds of reference audio for zero-shot cloning of American and British accents.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase