U

Ultravox V0 5 Llama 3 3 70b Tempfix

Developed by zhuexe
Ultravox is a multimodal speech large language model capable of receiving both speech and text as input, supporting multiple languages and tasks.
Downloads 35
Release Time : 5/2/2025

Model Overview

Ultravox is a multimodal model based on Llama3.3-70B-Instruct and whisper-large-v3-turbo, capable of processing speech and text inputs, suitable for tasks such as speech agents, speech translation, and speech analysis.

Model Features

Multimodal Input
Supports simultaneous speech and text input, processing audio embeddings via special pseudo-tokens `<|audio|>`.
Multilingual Support
Supports over 40 languages, suitable for global multilingual applications.
High-Performance Inference
First token generation time (TTFT) is approximately 150 milliseconds, with a token generation speed of 50-100 tokens per second.

Model Capabilities

Speech Recognition
Speech Translation
Speech Analysis
Multimodal Input Processing
Text Generation

Use Cases

Speech Agent
Voice Assistant
Acts as a speech agent, answering user questions and providing assistance.
Efficiently processes speech input and generates natural language responses.
Speech Translation
Multilingual Speech Translation
Translates speech from one language into text or speech in another language.
Performs excellently across multiple language pairs, such as a BLEU score of 21.37 for English to Chinese.
Speech Analysis
Speech Content Analysis
Analyzes speech content and extracts key information.
Supports speech analysis in multiple languages and complex scenarios.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase