# Multi-dataset Training

Icedit Normal Lora
Other
This is an image-to-image conversion model based on LoRA technology, primarily used for non-commercial image editing tasks.
Image Generation English
I
RiverZ
1,046
7
Vitpose Plus Large
Apache-2.0
ViTPose++ is a vision Transformer-based foundation model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.
Pose Estimation Transformers
V
usyd-community
1,731
1
Kazrush Ru Kk
Apache-2.0
kazRush-ru-kk is a Russian-to-Kazakh translation model based on the T5 configuration, trained on multiple open-source parallel datasets.
Machine Translation Transformers Other
K
deepvk
332
8
Rad Dino Maira 2
Other
RAD-DINO-MAIRA-2 is a vision transformer model trained with DINOv2 self-supervised learning, specifically designed for encoding chest X-ray images.
Transformers
R
microsoft
9,414
11
Wav2vec2 Large Robust 6 Ft Age Gender
This model, fine-tuned from Wav2Vec2-Large-Robust, can predict the speaker's age and gender from raw audio.
Audio Classification Transformers
W
audeering
19.29k
2
Gpt2 Bangla Summurizer
This is a Bengali text summarization model based on the GPT2 architecture, specifically optimized for news content.
Text Generation Transformers Other
G
faridulreza
18
0
Whisper Base Japanese
Apache-2.0
This model is fine-tuned on the Common Voice, JVS, and JSUT datasets for Japanese speech recognition tasks using openai/whisper-base.
Speech Recognition Transformers Japanese
W
Ivydata
137
3
Stt De Fastconformer Hybrid Large Pc
This is a German automatic speech recognition model based on the FastConformer architecture, employing a hybrid training approach with Transformer and CTC, with a parameter size of approximately 115M.
Speech Recognition German
S
nvidia
1,017
4
T5 Small Korean Summarization
A Korean text summarization model based on the T5 architecture, specifically optimized for Korean text to generate concise and accurate summaries.
Text Generation Transformers Korean
T
eenzeenee
123
3
BENT PubMedBERT NER Gene
Apache-2.0
This is a named entity recognition model fine-tuned on PubMedBERT, specifically designed to identify gene and protein entities in biomedical texts.
Sequence Labeling Transformers English
B
pruas
87
13
All MiniLM L6 V2 128dim
Apache-2.0
This is a sentence embedding model based on the MiniLM architecture, capable of mapping text to a 384-dimensional vector space, suitable for tasks such as semantic search and sentence similarity calculation.
Text Embedding English
A
freedomfrier
1,377
0
Whisper Small Cantonese
Apache-2.0
A Cantonese speech recognition model fine-tuned based on OpenAI Whisper-small, achieving a CER of 7.93 on the Common Voice 16.0 test set
Speech Recognition Transformers Supports Multiple Languages
W
alvanlii
2,413
85
Stt Es Conformer Transducer Large
This is a large Conformer-Transducer model for Spanish automatic speech recognition, with approximately 120 million parameters, trained on 1340 hours of Spanish speech data.
Speech Recognition Spanish
S
nvidia
708
4
Stt Es Conformer Ctc Large
This is a large Conformer-CTC model for Spanish automatic speech recognition (ASR), trained and released by NVIDIA.
Speech Recognition Spanish
S
nvidia
59
2
Stt Fr Conformer Transducer Large
This is a large-scale Conformer-Transducer model for French automatic speech recognition, with approximately 120 million parameters, trained on over 1,500 hours of French speech data.
Speech Recognition French
S
nvidia
31
10
Stt De Conformer Ctc Large
This is a large-scale Conformer-CTC model for German automatic speech recognition, trained and optimized by NVIDIA on thousands of hours of German speech data.
Speech Recognition German
S
nvidia
132
4
Stt En Citrinet 1024 Gamma 0 25
NVIDIA Streaming Citrinet 1024 is a non-autoregressive model for English automatic speech recognition, based on CTC loss/decoding, with approximately 140 million parameters.
Speech Recognition English
S
nvidia
156
3
Densenet121 Res224 Chex
Apache-2.0
A pre-trained model based on the DenseNet121 architecture, specifically designed for chest X-ray image classification tasks with 18 output targets.
Image Classification Transformers
D
torchxrayvision
25
1
All MiniLM L6 V2
Apache-2.0
This is a sentence embedding model based on sentence-transformers, capable of mapping text to a 384-dimensional vector space, suitable for semantic search and clustering tasks.
Text Embedding English
A
obrizum
1,647
5
Wav2vec2 Large Xlsr Galician
Optimized automatic speech recognition model for Galician, fine-tuned based on wav2vec2-large-xlsr-53, with a WER of 7.12
Speech Recognition Transformers
W
ifrz
9,330
1
Bp500 Base10k Voxpopuli
Apache-2.0
This is a Wav2vec 2.0 speech recognition model optimized for Brazilian Portuguese, fine-tuned on multiple Brazilian Portuguese datasets
Speech Recognition Transformers Other
B
lgris
23
0
Wav2vec2 Large Xlsr 53 Japanese
Apache-2.0
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input
Speech Recognition Japanese
W
jonatasgrosman
2.9M
33
Wav2vec2 Xls R 1b German
Apache-2.0
This is a German automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple German speech datasets including Common Voice 8.0
Speech Recognition Transformers German
W
jonatasgrosman
105
3
Mt5 Small Sum De En V1
A bilingual summarization model based on multilingual T5, supporting English and German text summarization tasks
Text Generation Transformers Supports Multiple Languages
M
deutsche-telekom
1,210
8
Bp400 Xlsr
Apache-2.0
A Wav2vec 2.0 speech recognition model fine-tuned on Brazilian Portuguese datasets, supporting automatic speech recognition tasks for Brazilian Portuguese.
Speech Recognition Transformers Other
B
lgris
55
3
Wav2vec2 Base Turkish
Apache-2.0
This model is a Wav2Vec2 speech recognition model fine-tuned on the Common Voice Turkish dataset, demonstrating excellent performance in Turkish automatic speech recognition tasks.
Speech Recognition Transformers Other
W
cahya
49
4
Sbert Roberta Large Anli Mnli Snli
A sentence transformation model based on RoBERTa-large, specifically designed for sentence similarity tasks, trained on ANLI, MNLI, and SNLI datasets
Text Embedding Transformers English
S
usc-isi
38
2
W2v Hf Jsut Xlsr53
Apache-2.0
A Japanese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53 using the Common Voice and JSUT datasets.
Speech Recognition Transformers Japanese
W
qqpann
16
1
Wav2vec2 Large Xlsr 53 Chinese Zh Cn
Apache-2.0
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Chinese
W
jonatasgrosman
3.8M
110
Wav2vec2 Large Xlsr Open Brazilian Portuguese
Apache-2.0
This is a Wav2vec 2.0 model fine-tuned for Brazilian Portuguese, trained using multiple open Brazilian Portuguese datasets including Common Voice, MLS, CETUC, etc.
Speech Recognition Transformers Other
W
lgris
395
9
Wav2vec2 Live Japanese
Apache-2.0
A Japanese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting hiragana output
Speech Recognition Transformers Japanese
W
ttop324
20
4
Wav2vec2 Xls R 1b Spanish
Apache-2.0
This is a Spanish automatic speech recognition model fine-tuned based on the XLS-R 1 billion parameter model, trained and optimized on multiple Spanish datasets.
Speech Recognition Transformers Spanish
W
jonatasgrosman
2,270
6
Camembert Squadfr Fquad Piaf Answer Extraction
MIT
This model is fine-tuned from CamemBERT-base, specifically designed for answer extraction tasks in French texts, trained on SquadFR, FQuAD, and PIAF datasets.
Question Answering System Transformers French
C
lincoln
16
0
Wangchanberta Finetuned Sentiment
Apache-2.0
A model specialized in Thai text sentiment analysis, supporting positive, neutral, and negative sentiment classification.
Text Classification Transformers Other
W
poom-sci
615
12
Distilbert Fa Zwnj Base Ner
A DistilBERT model fine-tuned for Persian Named Entity Recognition (NER) tasks, supporting recognition of 10 entity types.
Sequence Labeling Transformers Other
D
HooshvareLab
101
4
Minilm L6 Mnli Fever Docnli Ling 2c
A binary natural language inference model trained on 8 NLI datasets, excelling in long-text reasoning tasks
Text Classification Transformers English
M
MoritzLaurer
22
2
Wav2vec2 Large Japanese
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supports 16kHz sampling rate input
Speech Recognition Japanese
W
NTQAI
316
7
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase