U

Ultravox V0 5 Llama 3 3 70b

Developed by fixie-ai
Ultravox is a multimodal voice large language model built upon Llama3.3-70B and Whisper, supporting both voice and text inputs, suitable for scenarios like voice agents and translation.
Downloads 3,817
Release Time : 1/31/2025

Model Overview

Ultravox is a multimodal model capable of processing both voice and text inputs simultaneously, integrating voice embeddings through special pseudo-tokens to generate text output. Future versions plan to support voice generation.

Model Features

Multimodal Input Support
Capable of processing both voice and text inputs simultaneously, integrating voice embeddings through special tokens.
Multilingual Support
Supports voice and text processing in over 40 languages.
High-performance Translation
Excels in speech translation tasks across multiple language pairs.
Future Voice Generation Capability
Planned future versions will support generating semantic and acoustic audio tokens for voice output.

Model Capabilities

Voice Understanding
Multilingual Speech Recognition
Speech Translation
Voice Agent
Speech Analysis
Text Generation

Use Cases

Voice Interaction
Voice Assistant
Acts as an intelligent voice assistant to answer user queries.
Natural and smooth conversational experience.
Translation Services
Real-time Speech Translation
Translates speech from one language to text in another language in real-time.
Achieves 20-49 BLEU scores on the covost2 test set.
Content Analysis
Speech Content Analysis
Analyzes speech content and generates summaries or key information.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase