Vit B 16 Aion400m E32 1finetuned 1
V
Vit B 16 Aion400m E32 1finetuned 1
Developed by Albe-njupt
Vision Transformer model based on OpenCLIP framework, fine-tuned for zero-shot image classification tasks
Downloads 18
Release Time : 3/4/2024
Model Overview
This model is a vision-language model based on the Vision Transformer (ViT) architecture, trained and fine-tuned using the AION-400M dataset, excelling in zero-shot image classification tasks.
Model Features
Zero-shot learning capability
Can classify images into new categories without specific training
Large-scale pre-training
Pre-trained and fine-tuned on the massive AION-400M dataset
Vision-language alignment
Joint embedding of image and text features through contrastive learning
Model Capabilities
Zero-shot image classification
Image-text matching
Cross-modal retrieval
Use Cases
Content classification
Automatic social media content tagging
Automatically add relevant tags to uploaded images
Improves content classification efficiency and reduces manual labeling costs
E-commerce
Automatic product image categorization
Automatically classify product images into corresponding categories
Enhances product listing efficiency and optimizes search experience
Featured Recommended AI Models