Model Selection

Vision-language alignment

# Vision-language alignment

LLM2CLIP Openai L 14 224

LLM2CLIP is an innovative approach that leverages large language models (LLMs) to unlock the potential of CLIP. It enhances text discriminability through a contrastive learning framework, breaking the limitations of the original CLIP text encoder.

LLM2CLIP Openai B 16

LLM2CLIP is an innovative method that leverages large language models (LLMs) to extend CLIP's capabilities, enhancing text discriminability through a contrastive learning framework and significantly improving cross-modal task performance.

LLM2CLIP EVA02 L 14 336

LLM2CLIP is an innovative approach that enhances CLIP's visual representation capabilities through large language models (LLMs), significantly improving cross-modal task performance

Vit Large Patch14 Clip 224.metaclip 400m

Vision Transformer model trained on MetaCLIP-400M dataset, supporting zero-shot image classification tasks

Image Classification

Vit Base Patch32 Clip 224.metaclip 2pt5b

A vision Transformer model trained on the MetaCLIP-2.5B dataset, compatible with both open_clip and timm frameworks

Image Classification

Clip Finetuned Csu P14 336 E3l57 L

This model is a fine-tuned version of openai/clip-vit-large-patch14-336, primarily used for image-text matching tasks.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase