Model Selection

Multimodal Speech Understanding

# Multimodal Speech Understanding

Ultravox V0 5 Llama 3 3 70b Tempfix

Ultravox is a multimodal speech large language model capable of receiving both speech and text as input, supporting multiple languages and tasks.

Transformers Supports Multiple Languages

Ultravox is a multimodal speech large language model based on Llama3.1-8B-Instruct and Whisper-small, capable of processing both speech and text inputs.

Transformers English

Ultravox V0 4 1 Llama 3 3 70b

Ultravox is a multimodal speech large language model based on Llama3.3-70B-Instruct and whisper-large-v3-turbo, capable of processing both speech and text inputs.

Transformers Supports Multiple Languages

Ultravox V0 4 1 Mistral Nemo

Ultravox is a multimodal model based on Mistral-Nemo and Whisper, capable of processing both speech and text inputs, suitable for tasks like voice agents and speech translation.

Transformers Supports Multiple Languages

Ultravox V0 4 1 Llama 3 1 70b

Ultravox is a multimodal speech large language model, built upon the pre-trained Llama3.1-70B-Instruct and whisper-large-v3-turbo backbones, capable of receiving both speech and text as inputs.

Transformers Supports Multiple Languages

Ultravox V0 4 1 Llama 3 1 8b

Ultravox is a multimodal speech large language model built on Llama3.1-8B-Instruct and whisper-large-v3-turbo, capable of processing both speech and text inputs.

Transformers Supports Multiple Languages

SpeechLLM is a multimodal large language model designed to predict speaker turn metadata in conversations, including speech activity, transcribed text, gender, age, accent, and emotion.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase