C

CLIP ViT L Rho50 K1 Constrained FARE2

Developed by LEAF-CLIP
A feature extraction model fine-tuned based on openai/clip-vit-large-patch14, optimizing the image and text encoders
Downloads 253
Release Time : 4/16/2025

Model Overview

This is a feature extraction model based on the CLIP architecture. The image and text encoders are fine-tuned using the FARE and LEAF methods, suitable for multimodal feature extraction tasks.

Model Features

Adversarial fine-tuning
The image encoder is fine-tuned using FARE at Îĩ = 2/255, enhancing the robustness against adversarial attacks
Semantic-constrained fine-tuning
The text encoder is fine-tuned using LEAF with k = 1, ΁ = 50 and semantic constraints
Multimodal feature extraction
Supports both image and text feature extraction, maintaining the multimodal capabilities of the original CLIP

Model Capabilities

Image feature extraction
Text feature extraction
Multimodal feature alignment

Use Cases

Computer vision
Image retrieval
Use the extracted image features for similar image retrieval
Natural language processing
Cross-modal retrieval
Implement cross-modal retrieval from text to image or from image to text
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase