C

CLIP SAE ViT L 14

Developed by zer0int
A CLIP model fine-tuned with sparse autoencoder (SAE), excelling in zero-shot image classification tasks, particularly adept at recognizing adversarial typographic attacks
Downloads 32
Release Time : 12/8/2024

Model Overview

This model is a fine-tuned version of OpenAI CLIP ViT-L/14, enhanced with sparse autoencoder technology to improve adversarial robustness, outperforming the original model on benchmarks like ImageNet/ObjectNet

Model Features

Enhanced Adversarial Robustness
Improves the model's ability to recognize adversarial typographic attacks through sparse autoencoder technology
High Performance
Achieves 89% accuracy on ImageNet/ObjectNet tests, surpassing the original CLIP model's 84.5%
Tencent Hunyuan Video Adaptation
Specially adapted as the optimal choice for Tencent Hunyuan Video framework
Advantage in Linear Probing Tasks
Performs best in linear probing tasks on CLIP_benchmark

Model Capabilities

Zero-shot Image Classification
Adversarial Sample Recognition
Multimodal Understanding
Text-Image Matching

Use Cases

Content Security
Adversarial Typographic Attack Detection
Identifies adversarial images processed with special typographic techniques
Accurately classifies adversarial samples like black-and-white cats/dogs
Video Processing
Tencent Hunyuan Video Integration
Serves as the visual encoder for video understanding modules
Best used with dedicated ComfyUI nodes for optimal performance
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase