Model Selection

Multi-dataset Training

# Multi-dataset Training

Icedit Normal Lora

This is an image-to-image conversion model based on LoRA technology, primarily used for non-commercial image editing tasks.

Image Generation English

Vitpose Plus Large

ViTPose++ is a vision Transformer-based foundation model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.

Pose Estimation

kazRush-ru-kk is a Russian-to-Kazakh translation model based on the T5 configuration, trained on multiple open-source parallel datasets.

Machine Translation

Transformers Other

Rad Dino Maira 2

RAD-DINO-MAIRA-2 is a vision transformer model trained with DINOv2 self-supervised learning, specifically designed for encoding chest X-ray images.

Wav2vec2 Large Robust 6 Ft Age Gender

This model, fine-tuned from Wav2Vec2-Large-Robust, can predict the speaker's age and gender from raw audio.

Audio Classification

Gpt2 Bangla Summurizer

This is a Bengali text summarization model based on the GPT2 architecture, specifically optimized for news content.

Text Generation

Transformers Other

Whisper Base Japanese

This model is fine-tuned on the Common Voice, JVS, and JSUT datasets for Japanese speech recognition tasks using openai/whisper-base.

Speech Recognition

Transformers Japanese

Stt De Fastconformer Hybrid Large Pc

This is a German automatic speech recognition model based on the FastConformer architecture, employing a hybrid training approach with Transformer and CTC, with a parameter size of approximately 115M.

Speech Recognition German

T5 Small Korean Summarization

A Korean text summarization model based on the T5 architecture, specifically optimized for Korean text to generate concise and accurate summaries.

Text Generation

Transformers Korean

BENT PubMedBERT NER Gene

This is a named entity recognition model fine-tuned on PubMedBERT, specifically designed to identify gene and protein entities in biomedical texts.

Sequence Labeling

Transformers English

All MiniLM L6 V2 128dim

This is a sentence embedding model based on the MiniLM architecture, capable of mapping text to a 384-dimensional vector space, suitable for tasks such as semantic search and sentence similarity calculation.

Text Embedding English

Whisper Small Cantonese

A Cantonese speech recognition model fine-tuned based on OpenAI Whisper-small, achieving a CER of 7.93 on the Common Voice 16.0 test set

Speech Recognition

Transformers Supports Multiple Languages

Stt Es Conformer Transducer Large

This is a large Conformer-Transducer model for Spanish automatic speech recognition, with approximately 120 million parameters, trained on 1340 hours of Spanish speech data.

Speech Recognition Spanish

Stt Es Conformer Ctc Large

This is a large Conformer-CTC model for Spanish automatic speech recognition (ASR), trained and released by NVIDIA.

Speech Recognition Spanish

Stt Fr Conformer Transducer Large

This is a large-scale Conformer-Transducer model for French automatic speech recognition, with approximately 120 million parameters, trained on over 1,500 hours of French speech data.

Speech Recognition French

Stt De Conformer Ctc Large

This is a large-scale Conformer-CTC model for German automatic speech recognition, trained and optimized by NVIDIA on thousands of hours of German speech data.

Speech Recognition German

Stt En Citrinet 1024 Gamma 0 25

NVIDIA Streaming Citrinet 1024 is a non-autoregressive model for English automatic speech recognition, based on CTC loss/decoding, with approximately 140 million parameters.

Speech Recognition English

Densenet121 Res224 Chex

A pre-trained model based on the DenseNet121 architecture, specifically designed for chest X-ray image classification tasks with 18 output targets.

Image Classification

torchxrayvision

All MiniLM L6 V2

This is a sentence embedding model based on sentence-transformers, capable of mapping text to a 384-dimensional vector space, suitable for semantic search and clustering tasks.

Text Embedding English

Wav2vec2 Large Xlsr Galician

Optimized automatic speech recognition model for Galician, fine-tuned based on wav2vec2-large-xlsr-53, with a WER of 7.12

Speech Recognition

Bp500 Base10k Voxpopuli

This is a Wav2vec 2.0 speech recognition model optimized for Brazilian Portuguese, fine-tuned on multiple Brazilian Portuguese datasets

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Japanese

Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input

Speech Recognition Japanese

Wav2vec2 Xls R 1b German

This is a German automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple German speech datasets including Common Voice 8.0

Speech Recognition

Transformers German

Mt5 Small Sum De En V1

A bilingual summarization model based on multilingual T5, supporting English and German text summarization tasks

Text Generation

Transformers Supports Multiple Languages

deutsche-telekom

A Wav2vec 2.0 speech recognition model fine-tuned on Brazilian Portuguese datasets, supporting automatic speech recognition tasks for Brazilian Portuguese.

Speech Recognition

Transformers Other

Wav2vec2 Base Turkish

This model is a Wav2Vec2 speech recognition model fine-tuned on the Common Voice Turkish dataset, demonstrating excellent performance in Turkish automatic speech recognition tasks.

Speech Recognition

Transformers Other

Sbert Roberta Large Anli Mnli Snli

A sentence transformation model based on RoBERTa-large, specifically designed for sentence similarity tasks, trained on ANLI, MNLI, and SNLI datasets

Transformers English

W2v Hf Jsut Xlsr53

A Japanese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53 using the Common Voice and JSUT datasets.

Speech Recognition

Transformers Japanese

Wav2vec2 Large Xlsr 53 Chinese Zh Cn

A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.

Speech Recognition Chinese

Wav2vec2 Large Xlsr Open Brazilian Portuguese

This is a Wav2vec 2.0 model fine-tuned for Brazilian Portuguese, trained using multiple open Brazilian Portuguese datasets including Common Voice, MLS, CETUC, etc.

Speech Recognition

Transformers Other

Wav2vec2 Live Japanese

A Japanese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting hiragana output

Speech Recognition

Transformers Japanese

Wav2vec2 Xls R 1b Spanish

This is a Spanish automatic speech recognition model fine-tuned based on the XLS-R 1 billion parameter model, trained and optimized on multiple Spanish datasets.

Speech Recognition

Transformers Spanish

Camembert Squadfr Fquad Piaf Answer Extraction

This model is fine-tuned from CamemBERT-base, specifically designed for answer extraction tasks in French texts, trained on SquadFR, FQuAD, and PIAF datasets.

Question Answering System

Transformers French

Wangchanberta Finetuned Sentiment

A model specialized in Thai text sentiment analysis, supporting positive, neutral, and negative sentiment classification.

Text Classification

Transformers Other

Distilbert Fa Zwnj Base Ner

A DistilBERT model fine-tuned for Persian Named Entity Recognition (NER) tasks, supporting recognition of 10 entity types.

Sequence Labeling

Transformers Other

Minilm L6 Mnli Fever Docnli Ling 2c

A binary natural language inference model trained on 8 NLI datasets, excelling in long-text reasoning tasks

Text Classification

Transformers English

Wav2vec2 Large Japanese

Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supports 16kHz sampling rate input

Speech Recognition Japanese

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase