E

Eva02 Large Patch14 224.mim M38m

Developed by timm
EVA02 feature/representation model, pretrained on Merged-38M dataset via masked image modeling, suitable for image classification and feature extraction tasks.
Downloads 571
Release Time : 3/31/2023

Model Overview

The EVA-02 model adopts a Vision Transformer architecture, incorporating mean pooling, SwiGLU activation function, Rotary Position Embedding (ROPE), and additional layer normalization in MLP, primarily used for image classification and feature extraction.

Model Features

Large-scale Pretraining
Pretrained on the Merged-38M dataset (including IN-22K, CC12M, CC3M, COCO, etc.) via masked image modeling.
Efficient Architecture
Utilizes a Vision Transformer architecture with mean pooling, SwiGLU activation function, Rotary Position Embedding (ROPE), and additional layer normalization in MLP.
High Performance
Outstanding performance on ImageNet-1k, achieving a Top1 accuracy of 89.57%.

Model Capabilities

Image Classification
Image Feature Extraction

Use Cases

Computer Vision
Image Classification
Used for classifying images, supporting recognition of multiple categories.
Top1 accuracy 89.57%, Top5 accuracy 98.918%.
Feature Extraction
Extracts deep features from images, applicable to downstream tasks such as object detection and image segmentation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase