U

Ultravox V0 6 Qwen 3 32b

Developed by fixie-ai
Ultravox is a large multimodal speech language model capable of understanding and processing speech input, supporting multiple languages and noisy environments.
Downloads 1,240
Release Time : 6/20/2025

Model Overview

Ultravox is a multimodal model built around pre - trained large language models (such as Llama, Gemma, Qwen, etc.) and speech encoders. It can process both speech and text inputs simultaneously and is suitable for tasks such as voice agents, speech translation, and speech analysis.

Model Features

Multimodal input
Can process both speech and text inputs simultaneously, supporting complex interaction scenarios.
Multilingual support
Supports over 40 languages, including Hindi, Chinese, Spanish, etc.
Noise robustness
Trained on a noisy dataset, it can recognize speech in noisy environments and output special markers.
Future speech output
Plans to expand support for generating semantic and acoustic audio tokens to implement speech output functionality.

Model Capabilities

Speech understanding
Speech - to - text conversion
Multilingual speech translation
Speech recognition in noisy environments
Voice agent interaction

Use Cases

Voice interaction
Voice agent
Serves as an intelligent agent capable of understanding and responding to voice input.
Enables natural human - machine voice interaction
Speech translation
Multilingual speech translation
Realtime translates the speech of one language into the text of another language.
Achieves a BLEU score of 12.94 - 49.29 on the covost2 test set
Speech analysis
Speech content analysis
Analyzes speech content and extracts key information.
Achieves an accuracy of 69.70% on the big bench audio test set
Featured Recommended AI Models
ยฉ 2025AIbase