E

Eva02 Large Patch14 Clip 336.merged2b S6b B61k

Developed by timm
EVA02 is a large-scale vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.
Downloads 15.78k
Release Time : 4/10/2023

Model Overview

This model is based on the CLIP architecture, combining visual and language processing capabilities, suitable for cross-modal tasks such as zero-shot image classification.

Model Features

Zero-shot Learning
Supports image classification tasks without the need for task-specific training.
Cross-modal Understanding
Capable of processing both visual and language information to establish associations between images and text.
Large-scale Pre-training
Pre-trained on large-scale datasets, possessing strong generalization capabilities.

Model Capabilities

Zero-shot Image Classification
Cross-modal Retrieval
Image-Text Matching

Use Cases

Image Classification
Zero-shot Image Classification
Classify images of new categories without specific training.
Cross-modal Retrieval
Image-Text Retrieval
Retrieve relevant images based on text descriptions or generate descriptive text from images.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase