A

Aero 1 Audio

Developed by lmms-lab
Lightweight audio model, excelling in speech recognition, audio understanding, and executing audio instructions among other diverse tasks
Downloads 1,348
Release Time : 4/25/2025

Model Overview

A lightweight audio model based on the Qwen-2.5-1.5B language model, demonstrating excellent performance in multiple audio benchmarks, capable of accurately processing continuous audio inputs up to 15 minutes long.

Model Features

Parameter efficiency
Maintains parameter efficiency even when compared to larger models like Whisper, Qwen-2-Audio, Phi-4-Multimodal, or commercial services such as ElevenLabs/Scribe.
High training efficiency
Training completed in just one day using only 16 H100 GPUs and 50,000 hours of audio data. High-quality filtered data significantly improves training sample efficiency.
Long audio processing capability
Capable of accurately processing continuous audio inputs up to 15 minutes long (including ASR and semantic understanding), a scenario where most current models still face challenges.

Model Capabilities

Speech recognition
Audio understanding
Executing audio instructions

Use Cases

Speech transcription
Audio content transcription
Transcribe audio content into text
Accurately processes continuous audio inputs up to 15 minutes long
Audio understanding
Audio semantic understanding
Understand the semantic content within audio
Demonstrates excellent performance in multiple audio benchmarks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase