U

Ultravox V0 5 Llama 3 2 1b

Developed by fixie-ai
Ultravox is a multimodal voice large language model based on Llama3.2-1B and Whisper-large-v3, capable of processing both voice and text inputs.
Downloads 167.25k
Release Time : 2/6/2025

Model Overview

Ultravox is a multimodal model that can receive voice and text as input and generate text output. It combines speech understanding and language generation capabilities, suitable for tasks such as voice agents and speech translation.

Model Features

Multimodal Input
Capable of receiving both voice and text as input to handle complex multimodal tasks.
Multilingual Support
Supports over 40 languages, suitable for global application scenarios.
Knowledge Distillation Training
Trained with knowledge distillation loss function, enabling the model to match the logical output of the text-based Llama backbone.

Model Capabilities

Speech understanding
Text generation
Speech-to-text conversion
Multilingual processing
Voice agent

Use Cases

Voice Interaction
Voice Agent
Acts as an intelligent agent capable of understanding and responding to voice input
Language Translation
Speech-to-Speech Translation
Converts voice input in one language to text or voice output in another language
Performs well on the covost2 dataset, e.g., en_de translation BLEU score 14.21
Speech Analysis
Speech Content Understanding
Analyzes speech content and extracts key information
Scores 39.14 on the big bench audio task
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase