U

Ultravox V0 4 1 Llama 3 3 70b

Developed by fixie-ai
Ultravox is a multimodal speech large language model based on Llama3.3-70B-Instruct and whisper-large-v3-turbo, capable of processing both speech and text inputs.
Downloads 26
Release Time : 12/16/2024

Model Overview

Ultravox is a multimodal model that can receive both speech and text inputs, suitable for tasks such as voice agents, speech-to-speech translation, and spoken audio analysis.

Model Features

Multimodal Input
Capable of receiving both speech and text inputs, processing text prompts containing special pseudo-tokens.
Multilingual Support
Supports speech and text processing in 15 languages.
Efficient Training
Only trains multimodal adapters while keeping Whisper encoder and Llama frozen, improving training efficiency.

Model Capabilities

Speech Recognition
Text Generation
Speech-to-Speech Translation
Spoken Audio Analysis

Use Cases

Voice Agents
Voice Assistant
Acts as a voice assistant to answer user queries.
Speech Translation
Multilingual Speech Translation
Translates speech from one language to text or speech in another language.
BLEU score of 19.64 for English-Arabic translation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase