Eva02 Large Patch14 Clip 224.merged2b S4b B131k
E
Eva02 Large Patch14 Clip 224.merged2b S4b B131k
Developed by timm
EVA02 is a large-scale vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.
Downloads 5,696
Release Time : 4/10/2023
Model Overview
This model is a vision-language model based on the CLIP architecture, focusing on zero-shot image classification tasks. It achieves powerful cross-modal understanding capabilities through joint training of image and text encoders.
Model Features
Zero-shot learning capability
Can perform image classification tasks without task-specific training
Cross-modal understanding
Capable of processing and understanding both visual and textual information
Large-scale pretraining
Pretrained on large-scale datasets with strong generalization capabilities
Model Capabilities
Zero-shot image classification
Image-text matching
Cross-modal retrieval
Use Cases
Computer vision
Image classification
Classify images without category-specific training
Performs well on various benchmarks
Content moderation
Identify inappropriate content in images
E-commerce
Product categorization
Automatically categorize product images on e-commerce platforms
Featured Recommended AI Models
Š 2025AIbase