U

Ultravox V0 3

Developed by fixie-ai
Ultravox is a multimodal speech large language model built upon Llama3.1-8B-Instruct and Whisper-small, capable of processing both speech and text inputs.
Downloads 48.30k
Release Time : 7/25/2024

Model Overview

Ultravox is a multimodal model that can receive speech and text inputs and generate text outputs. It is suitable for tasks such as voice agents, speech-to-speech translation, and speech analysis.

Model Features

Multimodal Input
Can simultaneously receive speech and text inputs, processing audio embeddings via special pseudo-tokens <|audio|>.
Speech Understanding
Capable of comprehending and processing speech content, suitable for voice agent and speech analysis tasks.
Knowledge Distillation
Employs a knowledge distillation loss function to align the model's logical outputs with the text-based Llama backbone network.

Model Capabilities

Speech Recognition
Text Generation
Speech-to-Text Translation
Speech Analysis

Use Cases

Voice Agents
Voice Assistant
Acts as a voice assistant, answering user queries and providing assistance.
Speech Translation
Speech-to-Speech Translation
Translates speech input in one language into text output in another language.
English-to-German BLEU 22.68, Spanish-to-English BLEU 24.10
Speech Analysis
Speech Content Analysis
Analyzes speech content to extract key information or generate summaries.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase