U

Ultravox V0 2

Developed by fixie-ai
Ultravox is a multimodal voice large language model built upon Llama3-8B-Instruct and Whisper-small, capable of processing both speech and text inputs.
Downloads 792
Release Time : 6/7/2024

Model Overview

Ultravox is a multimodal model that can receive speech and text inputs (e.g., system text prompts and user voice messages) and generate text outputs. Suitable for scenarios such as voice agents, speech-to-speech translation, and voice analysis.

Model Features

Multimodal Input
Capable of receiving both speech and text inputs, processing audio embeddings through special pseudo-tokens <|audio|>.
Speech Understanding
Equipped with the ability to hear and comprehend speech, suitable for scenarios like voice agents and voice analysis.
Future Expansion
Plans to support generating semantic and acoustic audio tokens for voice output.

Model Capabilities

Speech Recognition
Text Generation
Multimodal Input Processing
Voice Agent
Speech-to-Speech Translation
Voice Analysis

Use Cases

Voice Agent
Voice Assistant
Acts as a voice assistant, answering user questions and providing assistance.
Speech Translation
Speech-to-Speech Translation
Converts speech input in one language to speech output in another language.
Voice Analysis
Speech Content Analysis
Analyzes speech content to extract key information or emotions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase