U

Ultravox V0 5 Llama 3 1 8b

Developed by fixie-ai
Ultravox is a multimodal voice large language model built on Llama3.1-8B-Instruct and whisper-large-v3-turbo, capable of processing both voice and text inputs.
Downloads 17.86k
Release Time : 2/5/2025

Model Overview

Ultravox is a multimodal model that can receive both voice and text inputs, serving as a voice agent or for tasks such as voice-to-voice translation and spoken audio analysis.

Model Features

Multimodal Input
Capable of processing both voice and text inputs, integrating audio embeddings with text through special pseudo-tokens <|audio|>.
Voice Understanding Capability
Equipped with the ability to hear and understand voice, making it suitable as a voice agent.
Knowledge Distillation Training
Utilizes knowledge distillation loss functions to align the model's logical outputs as closely as possible with the text-based Llama backbone network.

Model Capabilities

Voice understanding
Voice-to-voice translation
Spoken audio analysis
Multimodal input processing

Use Cases

Voice Agent
Voice Assistant
Acts as a voice assistant to answer user questions
Voice Translation
Multilingual Voice Translation
Supports voice-to-voice translation in multiple languages
Achieves BLEU scores ranging from 12.99 (English to Arabic) to 42.13 (Russian to English) on the covost2 dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase