Fish Voice Speech V1.5 Open-Source TTS Model - A Text-to-Speech Tool Based on Over 1 Million Hours of Multilingual Audio Data

Fish Speech 1.5

Developed by ModelsLab

Fish Speech V1.5 is a leading text-to-speech (TTS) model, trained on over 1 million hours of multilingual audio data.

Speech Synthesis Supports Multiple Languages#Multilingual TTS #Million-hour level training #Academic research friendly

Downloads 98

Release Time : 2/27/2025

Model Overview

Advanced multilingual text-to-speech synthesis system, supporting 13 languages, with special optimizations for Chinese and English speech synthesis.

Model Features

Multilingual support

Supports text-to-speech in 13 languages, with special optimizations for Chinese and English speech synthesis.

Large-scale training data

Trained on over 1 million hours of multilingual audio data, with over 300,000 hours each for Chinese and English.

Academic research support

Related research papers have been published on arXiv, providing academic citation support.

Model Capabilities

Text-to-speech

Multilingual speech synthesis

High-quality speech output

Use Cases

Speech synthesis applications

Voice assistants

Provides natural speech output for smart devices

More natural multilingual speech experience

Audiobooks

Converts text content into speech

High-quality multilingual audio content

Educational applications

Pronunciation assistance for language learning apps

Accurate pronunciation examples

🚀 Fish Speech V1.5

Fish Speech V1.5 is a cutting - edge text - to - speech (TTS) model. It has been trained on over 1 million hours of audio data across multiple languages, offering high - quality speech synthesis capabilities.

🚀 Quick Start

For more information, please refer to Fish Speech Github. You can also check out the demo at Fish Audio.

✨ Features

Multilingual Support: It supports a wide range of languages, including English, Chinese, Japanese, German, French, Spanish, Korean, Arabic, Russian, Dutch, Italian, Polish, and Portuguese.
Large - scale Training: Trained on more than 1 million hours of audio data, ensuring high - quality speech output.

Language and Training Hours

Property	Details
Supported Languages	English (en), Chinese (zh), Japanese (ja), German (de), French (fr), Spanish (es), Korean (ko), Arabic (ar), Russian (ru), Dutch (nl), Italian (it), Polish (pl), Portuguese (pt)
English Training Hours	>300k hours
Chinese Training Hours	>300k hours
Japanese Training Hours	>100k hours
German Training Hours	~20k hours
French Training Hours	~20k hours
Spanish Training Hours	~20k hours
Korean Training Hours	~20k hours
Arabic Training Hours	~20k hours
Russian Training Hours	~20k hours
Dutch Training Hours	<10k hours
Italian Training Hours	<10k hours
Polish Training Hours	<10k hours
Portuguese Training Hours	<10k hours

📚 Documentation

Citation

If you find this repository useful, please consider citing this work:

@misc{fish-speech-v1.4,
      title={Fish - Speech: Leveraging Large Language Models for Advanced Multilingual Text - to - Speech Synthesis}, 
      author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},
      year={2024},
      eprint={2411.01156},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.01156}, 
}

📄 License

This model is permissively licensed under the BY - CC - NC - SA - 4.0 license.

⚠️ Important Note

You agree to not use the model to generate contents that violate DMCA or local laws.

💡 Usage Tip

When using the model, you need to fill in the following information: Country, Specific date, and confirm that you agree to use this model for non - commercial use ONLY.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご