DFN Public
D
DFN Public
Developed by apple
This is a CLIP-based ViT-B/32 model trained using a Data Filtering Network (DFN) on datasets including CC12M, CC3M, and Shutterstock 15M, suitable for zero-shot image classification tasks.
Downloads 3,822
Release Time : 7/8/2024
Model Overview
This model is a vision-language Transformer based on Contrastive Language-Image Pre-training (CLIP), automatically filtering training data through a Data Filtering Network, capable of performing zero-shot image classification and image-text matching tasks.
Model Features
Data Filtering Network Training
Uses a small Data Filtering Network (DFN) to automatically filter large-scale uncurated datasets, improving training data quality
Multi-dataset Joint Training
Combines three datasets—Conceptual Captions 12M/3M and Shutterstock 15M—for training
Zero-shot Classification Capability
Can be directly applied to new image classification tasks without task-specific fine-tuning
Model Capabilities
Zero-shot image classification
Image-text matching
Cross-modal retrieval
Use Cases
Content Management
Automatic Image Tagging
Automatically generates descriptive labels for unlabeled images
E-commerce
Product Image Classification
Automatically classifies product images based on descriptions
Featured Recommended AI Models