# Real-time Audio Processing
Parakeet Tdt Ctc 0.6b Ja
This model is a Japanese automatic speech recognition (ASR) model based on the FastConformer architecture, developed by NVIDIA and converted to MLX format.
Speech Recognition
P
mlx-community
368
1
Distilhubert Finetuned Gtzan
Apache-2.0
An audio classification model fine-tuned on the GTZAN music classification dataset based on DistilHuBERT, achieving 83% accuracy
Audio Classification
Transformers

D
Leo1212
25
0
Whisper Large V3 Gguf
Apache-2.0
Whisper is a multilingual automatic speech recognition (ASR) system that supports speech-to-text tasks in multiple languages.
Speech Recognition Supports Multiple Languages
W
vonjack
931
14
Faster Whisper Large V3 Ja
MIT
Japanese-optimized version based on OpenAI Whisper large-v3, supporting multilingual speech recognition
Speech Recognition Supports Multiple Languages
F
JhonVanced
46
3
Sonic48k
Sonic48k is an audio-to-audio model based on RVC (Retrieval-based Voice Conversion) technology, primarily used for voice conversion tasks.
Speech Synthesis
Transformers

S
sail-rvc
25
1
Luffysan2333333
This is an RVC (Retrieval-Based Voice Conversion) model designed for audio-to-audio tasks, capable of performing voice conversion.
Speech Synthesis
Transformers

L
sail-rvc
1,040
0
KORONE
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into speech with a specific style.
Speech Synthesis
Transformers

K
sail-rvc
16
1
Homersimpson2333333
This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into the voice style of Homer Simpson.
Speech Synthesis
Transformers

H
sail-rvc
11.36k
1
Edsheeran2333333
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into speech with a specific style.
Speech Synthesis
Transformers

E
sail-rvc
3,637
1
DBZ Vegeta RVC
This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into the voice of Vegeta (a character from 'Dragon Ball').
Speech Synthesis
Transformers

D
sail-rvc
1,678
0
Chicken V2 E250 S3750
This is an RVC (Retrieval-based Voice Conversion) model designed for audio-to-audio tasks, capable of voice transformation.
Speech Synthesis
Transformers

C
sail-rvc
321
0
21savage
This is an RVC (Retrieval-Based Voice Conversion) model designed for audio-to-audio conversion tasks.
Speech Synthesis
Transformers

2
sail-rvc
1,739
0
Distilhubert Finetuned Gtzan
Apache-2.0
This is an audio classification model fine-tuned on the GTZAN music classification dataset based on DistilHuBERT, achieving an accuracy of 82%
Audio Classification
Transformers

D
sanchit-gandhi
255
4
Wav2vec2 Keyword Spotting Int8
A speech keyword detection model based on the wav2vec2 architecture, optimized with Optimum OpenVINO quantization
Speech Recognition
Transformers

W
sampras343
17
0
Wangyou Zhang Chime4 Enh Train Enh Conv Tasnet Raw
A speech enhancement model trained based on the ESPnet framework, using the chime4 dataset, suitable for single-channel speech enhancement tasks.
Audio Enhancement
W
espnet
57
1
Wav2vec2 Large Xlsr 53 Italian
Apache-2.0
Large-scale Italian automatic speech recognition model based on the Wav2Vec2 architecture, fine-tuned on the Common Voice dataset, released by Facebook
Speech Recognition Other
W
facebook
4,013
6
Featured Recommended AI Models