FARE4 ViT B 16 Laion2b S34b B88k
F
FARE4 ViT B 16 Laion2b S34b B88k
Developed by chs20
A robust perceptual metric model based on CLIP, enhanced through adversarial fine-tuning for improved performance in perceptual similarity tasks.
Downloads 23
Release Time : 8/14/2024
Model Overview
This model is a vision-language model based on the CLIP architecture, adversarially fine-tuned (using the FARE method) to significantly improve robustness against adversarial attacks while maintaining performance on clean data. Primarily used for zero-shot image classification and perceptual similarity tasks.
Model Features
Adversarial Robustness
Adversarially fine-tuned on ImageNet with L-infinity norm and radius 4/255 using the FARE method, significantly enhancing the model's resistance to adversarial attacks.
High-performance Perceptual Metric
Excels in the NIGHTS perceptual similarity task, achieving 90.6% accuracy on clean data and maintaining high performance under adversarial attacks.
CLIP-based Architecture
Built on the mature CLIP-ViT-B-16 architecture, inheriting CLIP's strong vision-language alignment capabilities.
Model Capabilities
Zero-shot Image Classification
Perceptual Similarity Metric
Adversarially Robust Image Analysis
Use Cases
Computer Vision
Image Similarity Assessment
Used to evaluate the perceptual similarity between two images from a human perspective.
Achieves 90.6% accuracy on the NIGHTS dataset.
Robust Image Classification
Maintains good classification performance even in the presence of adversarial interference.
Maintains 71.5% accuracy under L-infinity attack (eps=4/255).
Safety-critical Applications
Adversarial Attack Detection
Identifies image content that may have been tampered with by adversarial attacks.
Featured Recommended AI Models
Š 2025AIbase