Model Selection

Enhanced Semantic Understanding

# Enhanced Semantic Understanding

Vit So400m Patch16 Siglip Gap 256.v2 Webli

ViT image encoder based on SigLIP 2, using global average pooling, with attention pooling head removed, suitable for image feature extraction tasks.

Vit So400m Patch16 Siglip 512.v2 Webli

A vision Transformer model based on SigLIP 2, designed for image feature extraction and suitable for multilingual vision-language tasks.

Vit So400m Patch16 Siglip 256.v2 Webli

SigLIP 2 ViT model, containing only the image encoder part for image feature extraction, trained on the WebLI dataset.

Vit So400m Patch14 Siglip 224.v2 Webli

A Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction and pretrained on the webli dataset.

Image Classification

Vit Large Patch16 Siglip 512.v2 Webli

ViT image encoder based on SigLIP 2, designed for timm, suitable for vision-language tasks

Image Classification

Vit Large Patch16 Siglip 384.v2 Webli

A vision Transformer model based on the SigLIP 2 architecture, designed for image feature extraction, pretrained on the webli dataset

Vit Giantopt Patch16 Siglip 256.v2 Webli

Vision Transformer model based on SigLIP 2 technology, focused on image feature extraction

Vit Base Patch16 Siglip 512.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset

Siglip2 So400m Patch16 Naflex

SigLIP 2 is an improved model based on the SigLIP pre-training objective, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Large Patch16 256

SigLIP 2 is an improved vision-language model based on SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Base Patch16 512

SigLIP 2 is a vision-language model that integrates multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Base Patch16 256

SigLIP 2 is a multilingual vision-language encoder with improved semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Base Patch32 256

SigLIP 2 is an improved version of SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Ernie 2.0 Base En

ERNIE 2.0 is a continuous pre-training framework proposed by Baidu in 2019, which gradually constructs and optimizes pre-training tasks through continuous multi-task learning. It outperforms BERT and XLNet in multiple tasks.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase