F

Fare4 Clip

Developed by chs20
Vision-language model initialized with OpenAI CLIP, enhanced robustness through unsupervised adversarial fine-tuning
Downloads 45
Release Time : 2/23/2024

Model Overview

This model is a vision-language model based on the CLIP architecture, which has been enhanced for robustness through unsupervised adversarial fine-tuning on the ImageNet dataset using L-infinity norm and a radius of 4/255.

Model Features

Unsupervised Adversarial Fine-tuning
Adversarial training on ImageNet using L-infinity norm with a radius of 4/255 to enhance model robustness
Based on CLIP Architecture
Inherits CLIP's powerful vision-language alignment capabilities
Enhanced Robustness
Specifically optimized for adversarial attack scenarios to improve model stability

Model Capabilities

Zero-shot Image Classification
Image-Text Matching
Cross-modal Retrieval

Use Cases

Computer Vision
Robust Image Classification
Reliable image classification in adversarial attack environments
Exhibits stronger adversarial robustness compared to standard CLIP models
Cross-modal Retrieval
Mutual retrieval between images and text under adversarial conditions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase