Eva02 Large Patch14 Clip 336.merged2b S6b B61k
E
Eva02 Large Patch14 Clip 336.merged2b S6b B61k
Developed by timm
EVA02 is a large-scale vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.
Downloads 15.78k
Release Time : 4/10/2023
Model Overview
This model is based on the CLIP architecture, combining visual and language processing capabilities, suitable for cross-modal tasks such as zero-shot image classification.
Model Features
Zero-shot Learning
Supports image classification tasks without the need for task-specific training.
Cross-modal Understanding
Capable of processing both visual and language information to establish associations between images and text.
Large-scale Pre-training
Pre-trained on large-scale datasets, possessing strong generalization capabilities.
Model Capabilities
Zero-shot Image Classification
Cross-modal Retrieval
Image-Text Matching
Use Cases
Image Classification
Zero-shot Image Classification
Classify images of new categories without specific training.
Cross-modal Retrieval
Image-Text Retrieval
Retrieve relevant images based on text descriptions or generate descriptive text from images.
Featured Recommended AI Models
Š 2025AIbase