Model Selection

Short audio processing

# Short audio processing

Whisper Large V3 Broad Accent

An English broad accent classification model based on Whisper-Large-v3, capable of recognizing accents from the British Isles, North America, and other three categories of English accents

Audio Classification

Safetensors English

Gemma 3 4b It Speech

Gemma-3-MM is a multimodal instruction model extended from Gemma-3-4b-it with added speech processing capabilities, capable of handling text, image, and audio inputs to generate text outputs.

Teochew Whisper Medium

A Teochew (Chaozhou dialect) speech recognition model fine-tuned based on the Whisper medium model, specifically designed for recognizing the Teochew dialect of the Min Nan language family in southern China.

Speech Recognition

Wavlm Basic S F O 8batch 10sec 0.0001lr Unfrozen

A voice processing model fine-tuned based on microsoft/wavlm-large, achieving 80% accuracy and 79.57% F1 score on the evaluation set

Audio Classification

Wavlm Basic S R 5c 8batch 5sec 0.0001lr Unfrozen

A speech processing model fine-tuned based on microsoft/wavlm-large, achieving 75% accuracy on the evaluation set

Audio Classification

Wavlm Basic N F N 8batch 5sec 0.0001lr Unfrozen

A speech processing model fine-tuned based on microsoft/wavlm-large, achieving an accuracy of 73.33% on the evaluation set

Audio Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase