F

Fare2 Clip

Developed by chs20
Vision-language model initialized with OpenAI CLIP, enhanced with unsupervised adversarial fine-tuning for improved robustness
Downloads 543
Release Time : 2/23/2024

Model Overview

This model is a vision-language model based on the CLIP architecture, trained on the ImageNet dataset using unsupervised adversarial fine-tuning, specifically enhancing robustness against adversarial attacks.

Model Features

Adversarial Robustness
Utilizes unsupervised adversarial fine-tuning with infinity norm and radius 2/255, enhancing the model's resistance to adversarial attacks
Zero-shot Capability
Retains CLIP's zero-shot classification ability, applicable to new tasks without task-specific fine-tuning
Vision-Language Alignment
Maintains CLIP's alignment of visual and language representations, supporting cross-modal tasks

Model Capabilities

Zero-shot image classification
Cross-modal retrieval
Adversarial robustness analysis

Use Cases

Computer Vision
Robust Image Classification
Reliable image classification under adversarial attack conditions
Demonstrates stronger adversarial robustness on ImageNet
Cross-modal Retrieval
Cross-modal search between images and text
Security Applications
Adversarial Attack Detection
Identifying inputs that may contain adversarial perturbations
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase