U

Ultravox V0 6 Llama 3 3 70b

Developed by fixie-ai
Ultravox is a large multimodal speech language model that combines a pre-trained large language model and a speech encoder, capable of handling both speech and text inputs.
Downloads 196
Release Time : 5/27/2025

Model Overview

Ultravox is a large language model that can listen to and understand speech, and can be used for tasks such as voice agents, speech-to-speech translation, and speech audio analysis.

Model Features

Multimodal input
Can handle both speech and text inputs simultaneously, supporting mixed interactions between speech and text.
Hindi optimization
Trained on extended Hindi speech data, significantly improving the performance of Hindi speech understanding.
Noise robustness
Trained on a noise dataset, it can better handle noisy audio and output special markers when recognition fails.
Future speech output
Plans to expand the vocabulary to support the generation of semantic and acoustic audio tokens, enabling the speech output function.

Model Capabilities

Speech understanding
Speech translation
Speech audio analysis
Noise detection
Multilingual support

Use Cases

Voice interaction
Voice agent
Serves as an intelligent agent capable of understanding voice input for natural language interaction.
Voice translation
Multilingual voice translation
Translates the speech of one language into the text output of another language.
Achieved a BLEU score of 12.94 - 42.41 on the covost2 dataset
Audio analysis
Noise detection
Detects whether the input audio contains valid speech or is just noise.
Achieved a recall rate of 97.45% on the musan_noise dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase