Access Global AI Models - Power Next-Gen Apps

From General to Specialized AI - All Models in One Platform

Hot

Latest

High Likes

Filter

Commercial Models

Open Source Models

Classification

Framework

Open Source License

Language

23189 models match the criteria

Hot

Latest

High Likes

Nsfw Image Detection

An NSFW image classification model based on the ViT architecture, pre-trained on ImageNet-21k via supervised learning and fine-tuned on 80,000 images to distinguish between normal and NSFW content.

Image Classification

Fairface Age Image Detection

An image classification model based on Vision Transformer architecture, pre-trained on the ImageNet-21k dataset, suitable for multi-category image classification tasks

Image Classification

Clip Vit Large Patch14

CLIP is a vision-language model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, supporting zero-shot image classification.

Phi-2 is a small yet powerful language model developed by Microsoft, featuring 2.7 billion parameters, focusing on efficient inference and high-quality text generation.

Large Language Model Supports Multiple Languages

Chronos T5 Small

Chronos is a family of pre-trained time series forecasting models based on language model architectures. It converts time series into token sequences through quantization and scaling for training, suitable for probabilistic forecasting tasks.

A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods

Large Language Model English

Clip Vit Base Patch32

CLIP is a multimodal model developed by OpenAI that can understand the relationship between images and text, supporting zero-shot image classification tasks.

Segmentation 3.0

This is a powerset-encoded speaker diarization model capable of processing 10-second audio clips to identify multiple speakers and their overlapping speech.

Speaker Analysis

Speaker Diarization 3.1

An audio processing model for speaker segmentation that can automatically detect and segment different speakers in audio.

Speaker Analysis

Distilbert Base Uncased

DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.

Large Language Model English

Clipseg Rd64 Refined

CLIPSeg is an image segmentation model based on text and image prompts, supporting zero-shot and one-shot image segmentation tasks.

Image Segmentation

Llama 3.1 8B Instruct GGUF

Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.

Large Language Model English

Xlm Roberta Base

XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.

Large Language Model Supports Multiple Languages

An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning

Large Language Model English

An audio processing model for voice activity detection, overlap detection, and speaker diarization

Speaker Analysis

Vit Face Expression

A facial emotion recognition model fine-tuned based on Vision Transformer (ViT), supporting 7 expression classifications

Voice Activity Detection

Voice activity detection model based on pyannote.audio 2.1, used to identify speech activity segments in audio

Speech Recognition

OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.

Large Language Model English

Chronos Bolt Small

Chronos-Bolt is a series of pretrained time series foundation models based on the T5 architecture, achieving efficient time series forecasting through innovative chunk encoding and direct multi-step prediction

A pretrained model based on the transformers library, suitable for various NLP tasks

Large Language Model

Siglip So400m Patch14 384

SigLIP is a vision-language model pre-trained on the WebLi dataset, employing an improved sigmoid loss function to optimize image-text matching tasks.

Clip Vit Large Patch14 336

A large-scale vision-language pretrained model based on the Vision Transformer architecture, supporting cross-modal understanding between images and text

Llama 3.1 8B Instruct

Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.

Large Language Model

Transformers Supports Multiple Languages

The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.

Large Language Model Supports Multiple Languages

Xlm Roberta Large

XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, trained with a masked language modeling objective.

Large Language Model Supports Multiple Languages

Distilbert Base Uncased Finetuned Sst 2 English

Text classification model fine-tuned on the SST-2 sentiment analysis dataset based on DistilBERT-base-uncased, with 91.3% accuracy

Text Classification English

A small-scale vision Transformer model trained using the DINOv2 method, extracting image features through self-supervised learning

Image Classification

Wav2vec2 Large Xlsr 53 Portuguese

This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.

Speech Recognition Other

Vit Base Patch16 224

Vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks

Image Classification

Chronos Bolt Base

Chronos-Bolt is a series of pretrained time series forecasting models that support zero-shot prediction with high accuracy and fast inference speed.

Whisper Large V3

Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.

Speech Recognition Supports Multiple Languages

Clip Vit Base Patch16

CLIP is a multimodal model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, enabling zero-shot image classification capabilities.

Whisper Large V3 Turbo

Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.

Speech Recognition

Transformers Supports Multiple Languages

Wav2vec2 Large Xlsr 53 Russian

A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input

Speech Recognition Other

BART model pre-trained on English corpus, specifically fine-tuned for the CNN/Daily Mail dataset, suitable for text summarization tasks

Text Generation English

Wav2vec2 Large Xlsr 53 Chinese Zh Cn

A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.

Speech Recognition Chinese

FashionCLIP is a vision-language model fine-tuned specifically for the fashion domain based on CLIP, capable of generating universal product representations.

Transformers English

Jina Embeddings V3

Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.

Transformers Supports Multiple Languages

Stable Diffusion V1 5

Stable Diffusion is a latent text-to-image diffusion model capable of generating realistic images from any text input.

Image Generation

stable-diffusion-v1-5

Bart Large Mnli

Zero-shot classification model based on BART-large architecture, fine-tuned on MultiNLI dataset

Large Language Model

T5-Small is a 60-million-parameter text transformation model developed by Google, using a unified text-to-text framework to handle various NLP tasks

Large Language Model Supports Multiple Languages

Esm2 T36 3B UR50D

ESM-2 is a next-generation protein model trained with masked language modeling objectives, suitable for fine-tuning on various downstream tasks with protein sequences as input.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase