Fish Agent V0.1 3b
F
Fish Agent V0.1 3b
Developed by fishaudio
A groundbreaking speech-to-speech model capable of accurately capturing and generating environmental audio information, while featuring advanced text-to-speech capabilities.
Speech Synthesis Supports Multiple Languages#Non-semantic speech generation#Multilingual TTS#Environmental audio modeling
Downloads 653
Release Time : 10/29/2024
Model Overview
Fish Language Intelligent Agent V0.1 3B Edition is a versatile speech processing model supporting speech-to-speech and text-to-speech tasks, designed with a non-semantic token architecture that eliminates reliance on traditional semantic encoders/decoders.
Model Features
Non-semantic token architecture
Eliminates reliance on traditional semantic encoders/decoders like Whisper or CosyVoice for more efficient speech processing
Multilingual support
Supports speech processing in 8 languages including major languages like Chinese and English
Large-scale training data
Trained on a 700,000-hour multilingual audio dataset to ensure model performance
Versatile speech processing
Simultaneously supports speech-to-speech and text-to-speech tasks with broad application scenarios
Model Capabilities
Speech-to-speech
Text-to-speech
Speech-to-text
Multilingual speech processing
Use Cases
Speech synthesis
Multilingual speech synthesis
Convert text into natural and fluent speech output
Supports speech synthesis in 8 languages
Voice conversion
Voice style conversion
Transform input speech into output with different styles or characteristics
Featured Recommended AI Models