# End-to-end model
Yolov10s
YOLOv10 is a real-time end-to-end object detection model proposed by Tsinghua University, with significant improvements in both speed and accuracy.
Object Detection
Y
jameslahm
907
5
Paraformer Large
Apache-2.0
Paraformer is an innovative non-autoregressive end-to-end speech recognition model with significant advantages over traditional autoregressive models. It can generate entire target sentences in parallel, making it particularly suitable for GPU-accelerated parallel inference.
Speech Recognition Chinese
P
funasr
43
45
Kan Bayashi Csj Asr Train Asr Transformer Raw Char Sp Valid.acc.ave
This is a Japanese automatic speech recognition (ASR) model trained using the ESPnet framework, utilizing the CSJ dataset and based on the Transformer architecture.
Speech Recognition Japanese
K
espnet
13
0
S2t Medium Mustc Multilingual St
MIT
Transformer-based end-to-end multilingual speech translation model supporting English-to-multiple language speech translation
Speech Recognition
Transformers Supports Multiple Languages

S
facebook
7,322
6
Kan Bayashi Ljspeech Tacotron2
Tacotron2 text-to-speech model trained on ESPnet framework using LJSpeech dataset
Speech Synthesis English
K
espnet
40
3
Overlapped Speech Detection
MIT
A pre-trained model for detecting overlapped speech in audio, capable of identifying time segments where two or more speakers are active simultaneously.
Speaker Analysis
O
pyannote
144.68k
35
Speaker Segmentation
MIT
Speaker segmentation model based on pyannote.audio, used to detect speaker changes and speech activity in audio
Audio Processing
S
pyannote
182
33
S2t Small Covost2 Fr En St
MIT
A Transformer-based end-to-end speech translation model, specifically designed for French-to-English speech translation tasks
Speech Recognition
Transformers Supports Multiple Languages

S
facebook
18
0
Featured Recommended AI Models