# Vision-language alignment
LLM2CLIP Openai L 14 224
Apache-2.0
LLM2CLIP is an innovative approach that leverages large language models (LLMs) to unlock the potential of CLIP. It enhances text discriminability through a contrastive learning framework, breaking the limitations of the original CLIP text encoder.
Text-to-Image
L
microsoft
108
5
LLM2CLIP Openai B 16
Apache-2.0
LLM2CLIP is an innovative method that leverages large language models (LLMs) to extend CLIP's capabilities, enhancing text discriminability through a contrastive learning framework and significantly improving cross-modal task performance.
Text-to-Image
Safetensors
L
microsoft
1,154
18
LLM2CLIP EVA02 L 14 336
Apache-2.0
LLM2CLIP is an innovative approach that enhances CLIP's visual representation capabilities through large language models (LLMs), significantly improving cross-modal task performance
Text-to-Image
PyTorch
L
microsoft
75
60
Vit Large Patch14 Clip 224.metaclip 400m
Vision Transformer model trained on MetaCLIP-400M dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
294
0
Vit Base Patch32 Clip 224.metaclip 2pt5b
A vision Transformer model trained on the MetaCLIP-2.5B dataset, compatible with both open_clip and timm frameworks
Image Classification
V
timm
5,571
0
Clip Finetuned Csu P14 336 E3l57 L
This model is a fine-tuned version of openai/clip-vit-large-patch14-336, primarily used for image-text matching tasks.
Text-to-Image
Transformers

C
kevinoli
31
0
Featured Recommended AI Models