E

Eva02 Large Patch14 Clip 224.merged2b S4b B131k

Developed by timm
EVA02 is a large-scale vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.
Downloads 5,696
Release Time : 4/10/2023

Model Overview

This model is a vision-language model based on the CLIP architecture, focusing on zero-shot image classification tasks. It achieves powerful cross-modal understanding capabilities through joint training of image and text encoders.

Model Features

Zero-shot learning capability
Can perform image classification tasks without task-specific training
Cross-modal understanding
Capable of processing and understanding both visual and textual information
Large-scale pretraining
Pretrained on large-scale datasets with strong generalization capabilities

Model Capabilities

Zero-shot image classification
Image-text matching
Cross-modal retrieval

Use Cases

Computer vision
Image classification
Classify images without category-specific training
Performs well on various benchmarks
Content moderation
Identify inappropriate content in images
E-commerce
Product categorization
Automatically categorize product images on e-commerce platforms
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase