U

Ultravox V0 4 1 Llama 3 1 8b

Developed by fixie-ai
Ultravox is a multimodal speech large language model built on Llama3.1-8B-Instruct and whisper-large-v3-turbo, capable of processing both speech and text inputs.
Downloads 747
Release Time : 11/5/2024

Model Overview

Ultravox is a multimodal model that can receive speech and text inputs and generate text outputs. It is suitable for scenarios such as speech agents, speech translation, and speech analysis.

Model Features

Multimodal Input
Can simultaneously receive speech and text inputs, processing audio embeddings via special pseudo-tokens <|audio|>.
Multilingual Support
Supports 15 languages, including Chinese, English, Spanish, etc.
Efficient Inference
When using an A100-40GB GPU, the first token latency for audio content is about 150ms, with token generation speed around 50-100 tokens per second.

Model Capabilities

Speech Recognition
Text Generation
Speech Translation
Speech Analysis

Use Cases

Speech Agent
Voice Assistant
Acts as a voice assistant to answer user questions.
Speech Translation
Multilingual Translation
Translates speech input into multiple languages.
Achieves a BLEU score of 12.28 for English-Arabic translation and 27.13 for English-German translation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase