AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Audio Understanding

# Multimodal Audio Understanding

Qwen 2 Audio Instruct Dynamic Fp8
Apache-2.0
Qwen2-Audio is the latest version of the Qwen large audio language model series, capable of receiving various audio signal inputs and performing audio analysis or directly generating text responses based on voice commands.
Text-to-Audio Transformers English
Q
mlinmg
24
0
Mini Ichigo Llama3.2 3B S Instruct
Apache-2.0
The Ichigo-llama3s series model is a multimodal language model developed by Homebrew Research, natively supporting audio and text input comprehension. Based on the Llama-3 architecture, it is trained using WhisperVQ as an audio file tokenizer, enhancing its audio understanding capabilities.
Text-to-Audio English
M
Menlo
22
34
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase