Access Global AI Models - Power Next-Gen Apps

From General to Specialized AI - All Models in One Platform

Hot

Latest

High Likes

Filter

Commercial Models

Open Source Models

Classification

Framework

Open Source License

Language

Selected Conditions:

Reset

11789 models match the criteria

Hot

Latest

High Likes

Nsfw Image Detection

An NSFW image classification model based on the ViT architecture, pre-trained on ImageNet-21k via supervised learning and fine-tuned on 80,000 images to distinguish between normal and NSFW content.

Image Classification

Clip Vit Large Patch14

CLIP is a vision-language model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, supporting zero-shot image classification.

A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods

Large Language Model English

Clip Vit Base Patch32

CLIP is a multimodal model developed by OpenAI that can understand the relationship between images and text, supporting zero-shot image classification tasks.

Segmentation 3.0

This is a powerset-encoded speaker diarization model capable of processing 10-second audio clips to identify multiple speakers and their overlapping speech.

Speaker Analysis

Distilbert Base Uncased

DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.

Large Language Model English

Clipseg Rd64 Refined

CLIPSeg is an image segmentation model based on text and image prompts, supporting zero-shot and one-shot image segmentation tasks.

Image Segmentation

Llama 3.1 8B Instruct GGUF

Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.

Large Language Model English

Xlm Roberta Base

XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.

Large Language Model Supports Multiple Languages

An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning

Large Language Model English

An audio processing model for voice activity detection, overlap detection, and speaker diarization

Speaker Analysis

Vit Face Expression

A facial emotion recognition model fine-tuned based on Vision Transformer (ViT), supporting 7 expression classifications

OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.

Large Language Model English

Clip Vit Large Patch14 336

A large-scale vision-language pretrained model based on the Vision Transformer architecture, supporting cross-modal understanding between images and text

Llama 3.1 8B Instruct

Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.

Large Language Model

Transformers Supports Multiple Languages

The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.

Large Language Model Supports Multiple Languages

Xlm Roberta Large

XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, trained with a masked language modeling objective.

Large Language Model Supports Multiple Languages

Distilbert Base Uncased Finetuned Sst 2 English

Text classification model fine-tuned on the SST-2 sentiment analysis dataset based on DistilBERT-base-uncased, with 91.3% accuracy

Text Classification English

A small-scale vision Transformer model trained using the DINOv2 method, extracting image features through self-supervised learning

Image Classification

Wav2vec2 Large Xlsr 53 Portuguese

This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.

Speech Recognition Other

Vit Base Patch16 224

Vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks

Image Classification

Whisper Large V3

Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.

Speech Recognition Supports Multiple Languages

Clip Vit Base Patch16

CLIP is a multimodal model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, enabling zero-shot image classification capabilities.

Wav2vec2 Large Xlsr 53 Russian

A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input

Speech Recognition Other

BART model pre-trained on English corpus, specifically fine-tuned for the CNN/Daily Mail dataset, suitable for text summarization tasks

Text Generation English

Wav2vec2 Large Xlsr 53 Chinese Zh Cn

A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.

Speech Recognition Chinese

FashionCLIP is a vision-language model fine-tuned specifically for the fashion domain based on CLIP, capable of generating universal product representations.

Transformers English

Jina Embeddings V3

Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.

Transformers Supports Multiple Languages

Bart Large Mnli

Zero-shot classification model based on BART-large architecture, fine-tuned on MultiNLI dataset

Large Language Model

T5-Small is a 60-million-parameter text transformation model developed by Google, using a unified text-to-text framework to handle various NLP tasks

Large Language Model Supports Multiple Languages

Esm2 T36 3B UR50D

ESM-2 is a next-generation protein model trained with masked language modeling objectives, suitable for fine-tuning on various downstream tasks with protein sequences as input.

FLAN-T5 is a language model optimized through instruction fine-tuning based on the T5 model, supporting multilingual task processing and outperforming the original T5 model with the same parameter count.

Large Language Model Supports Multiple Languages

ALBERT is a lightweight pre-trained language model based on Transformer architecture, reducing memory usage through parameter sharing mechanism, suitable for English text processing tasks.

Large Language Model English

Wav2vec2 Large Xlsr 53 Dutch

A Dutch speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained on the Common Voice and CSS10 datasets, supporting 16kHz audio input.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Japanese

Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input

Speech Recognition Japanese

Blip Image Captioning Base

BLIP is an advanced vision-language pretrained model, excelling in image captioning tasks and supporting both conditional and unconditional text generation.

Distilbert Base Multilingual Cased

DistilBERT is a distilled version of the BERT base multilingual model, retaining 97% of BERT's performance with fewer parameters and faster speed. It supports 104 languages and is suitable for various natural language processing tasks.

Large Language Model

Transformers Supports Multiple Languages

DistilGPT2 is a lightweight distilled version of GPT-2 with 82 million parameters, retaining GPT-2's core text generation capabilities while being smaller and faster.

Large Language Model English

Xlm Roberta Base Language Detection

Multilingual detection model based on XLM-RoBERTa, supporting text classification in 20 languages

Text Classification

Transformers Supports Multiple Languages

The BLEURT model implemented based on PyTorch, used for text evaluation tasks in natural language processing.

Large Language Model

Table Transformer Detection

A table detection model based on the DETR architecture, specifically designed for extracting tables from unstructured documents

Object Detection

Blip Image Captioning Large

BLIP is a unified vision-language pretraining framework, excelling at image caption generation tasks, supporting both conditional and unconditional image caption generation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase