Kotoba-Speech v0.1 Open-source Japanese Speech Generation Model - Supports Text-to-Speech and Single-Sample Voice Cloning

Kotoba Speech V0.1

Developed by kotoba-tech

Kotoba-Speech v0.1 is a Japanese speech generation model based on a 1.2B parameter Transformer, supporting text-to-speech and one-shot voice cloning.

Speech Synthesis

Transformers

JapaneseOpen Source License:Apache-2.0 #Japanese TTS #Voice Cloning #1.2B Parameters

Downloads 23

Release Time : 3/14/2024

Model Overview

This model is an end-to-end Transformer architecture speech generation model, focusing on Japanese text-to-speech and voice cloning capabilities.

Model Features

Fluent Japanese Speech Generation

Capable of converting Japanese text into natural and fluent speech.

One-Shot Voice Cloning

Achieves voice cloning with just one sample through voice prompts.

Large Parameter Scale

Based on a 1.2B parameter Transformer architecture, delivering high-quality speech generation.

Model Capabilities

Japanese Text-to-Speech

Voice Cloning

Speech Synthesis

Use Cases

Voice Interaction

Voice Assistant

Provides natural and fluent speech output for Japanese voice assistants.

Enhances user experience with naturalness.

Content Creation

Audiobook Generation

Automatically converts Japanese text into audiobooks.

Efficiently generates high-quality speech content.

Personalized Services

Personalized Voice Cloning

Clones specific individuals' voices with minimal samples.

Enables personalized voice services.

Property	Details
Model Type	Our model is end-to-end transformers.
Language(s)	Japanese
Library	We'll release our training code soon. Inference and model code are largely adopted from metavoice.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Kotoba Speech V0.1

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Kotoba-Speech-v0.1

🚀 Quick Start

✨ Features

📚 Documentation

Model Details

📄 License

Acknowledgements