Vit Xsmall Patch16 Clip 224.tinyclip Yfcc15m
V
Vit Xsmall Patch16 Clip 224.tinyclip Yfcc15m
Developed by timm
A compact vision-language model based on CLIP architecture, designed for efficient zero-shot image classification
Downloads 444
Release Time : 3/20/2024
Model Overview
This model is a lightweight version of the CLIP architecture, trained on the YFCC15M dataset, suitable for zero-shot image classification tasks.
Model Features
Lightweight design
Utilizes XSmall-scale ViT architecture with lower computational resource requirements
Zero-shot learning
Capable of performing image classification tasks without domain-specific training
Multimodal understanding
Simultaneously comprehends visual and textual information for cross-modal matching
Model Capabilities
Zero-shot image classification
Image-text matching
Cross-modal retrieval
Use Cases
Content management
Automatic image tagging
Automatically generates descriptive tags for unlabeled images
Improves image library management efficiency
E-commerce
Product categorization
Classifies product images based on natural language descriptions
Supports new product categories without additional training
Featured Recommended AI Models