Model Selection

End-to-end model

# End-to-end model

YOLOv10 is a real-time end-to-end object detection model proposed by Tsinghua University, with significant improvements in both speed and accuracy.

Object Detection

Paraformer Large

Paraformer is an innovative non-autoregressive end-to-end speech recognition model with significant advantages over traditional autoregressive models. It can generate entire target sentences in parallel, making it particularly suitable for GPU-accelerated parallel inference.

Speech Recognition Chinese

Kan Bayashi Csj Asr Train Asr Transformer Raw Char Sp Valid.acc.ave

This is a Japanese automatic speech recognition (ASR) model trained using the ESPnet framework, utilizing the CSJ dataset and based on the Transformer architecture.

Speech Recognition Japanese

S2t Medium Mustc Multilingual St

Transformer-based end-to-end multilingual speech translation model supporting English-to-multiple language speech translation

Speech Recognition

Transformers Supports Multiple Languages

Kan Bayashi Ljspeech Tacotron2

Tacotron2 text-to-speech model trained on ESPnet framework using LJSpeech dataset

Speech Synthesis English

Overlapped Speech Detection

A pre-trained model for detecting overlapped speech in audio, capable of identifying time segments where two or more speakers are active simultaneously.

Speaker Analysis

Speaker Segmentation

Speaker segmentation model based on pyannote.audio, used to detect speaker changes and speech activity in audio

Audio Processing

S2t Small Covost2 Fr En St

A Transformer-based end-to-end speech translation model, specifically designed for French-to-English speech translation tasks

Speech Recognition

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase