E

Eva02 Base Patch14 224.mim In22k

Developed by timm
EVA02 base version visual representation model, pre-trained on ImageNet-22k through masked image modeling, suitable for image classification and feature extraction tasks.
Downloads 2,834
Release Time : 3/31/2023

Model Overview

This model adopts an improved Vision Transformer architecture, incorporating techniques such as mean pooling, SwiGLU activation function, and rotary position embeddings, specifically designed for efficient image feature extraction.

Model Features

Improved Transformer Architecture
Utilizes rotary position embeddings (ROPE) and SwiGLU activation function to enhance positional awareness and nonlinear expression capabilities
Efficient Pre-training Strategy
Uses EVA-CLIP as the MIM (Masked Image Modeling) teacher model for knowledge distillation
Multi-scale Feature Support
Obtains non-pooled multi-level visual features (257×768 tensor) through the forward_features method

Model Capabilities

Image Feature Extraction
Image Classification
Visual Representation Learning

Use Cases

Computer Vision
Image Classification System
Used to build high-precision image classifiers, supporting 224×224 resolution input
Achieves 88.23% Top1 accuracy on ImageNet-1k
Feature Extraction Service
Serves as a visual feature extractor for downstream tasks (e.g., object detection, image retrieval)
Outputs 768-dimensional feature vectors
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase