EVA02 Open-source Image Feature Model - Free for Image Classification and Feature Extraction Tasks

Eva02 Small Patch14 224.mim In22k

Developed by timm

EVA02 feature/representation model, pretrained on ImageNet-22k via masked image modeling, suitable for image classification and feature extraction tasks.

Image Classification

Transformers

Open Source License:MIT #Image Feature Extraction #Masked Image Modeling #Lightweight Vision Transformer

Downloads 705

Release Time : 3/31/2023

Model Overview

The EVA-02 model is a Vision Transformer featuring mean pooling, SwiGLU, Rotary Position Embedding (ROPE), and is suitable for image classification and feature extraction tasks.

Model Features

Masked Image Modeling Pretraining

Pretrained on ImageNet-22k using EVA-CLIP as the MIM teacher, enhancing the model's representation capability.

Advanced Transformer Architecture

Incorporates mean pooling, SwiGLU, Rotary Position Embedding (ROPE), and other techniques to improve model performance.

Efficient Computation

With 21.6 million parameters and 6.1 GMACs, it is suitable for deployment in resource-constrained environments.

Model Capabilities

Image Feature Extraction

Image Classification

Use Cases

Computer Vision

Image Classification

Used for classifying images, supporting recognition of multiple categories.

Pretrained on ImageNet-22k, achieving high classification accuracy.

Feature Extraction

Extracts image feature representations for downstream tasks such as object detection and image retrieval.

Provides high-quality image feature representations.

🚀 eva02_small_patch14_224.mim_in22k

An EVA02 feature / representation model. Pretrained on ImageNet-22k with masked image modeling (using EVA-CLIP as a MIM teacher) by paper authors. EVA-02 models are vision transformers with mean pooling, SwiGLU, Rotary Position Embeddings (ROPE), and extra LN in MLP (for Base & Large).

🚀 Quick Start

This section provides a quick guide on how to use the eva02_small_patch14_224.mim_in22k model for image classification and feature extraction.

✨ Features

Feature Representation: EVA02 models are designed to extract high - quality features from images.
Advanced Architecture: Incorporates mean pooling, SwiGLU, Rotary Position Embeddings (ROPE), and extra LN in MLP (for Base & Large).
Pretrained on Large Dataset: Pretrained on ImageNet - 22k using masked image modeling with EVA - CLIP as a MIM teacher.

📦 Installation

To use this model, you need to install the timm library. You can install it using pip:

pip install timm

💻 Usage Examples

Basic Usage

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('eva02_small_patch14_224.mim_in22k', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

Advanced Usage

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'eva02_small_patch14_224.mim_in22k',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 257, 384) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 Documentation

Model Details

Property	Details
Model Type	Image classification / feature backbone
Params (M)	21.6
GMACs	6.1
Activations (M)	18.3
Image size	224 x 224
Papers	EVA - 02: A Visual Representation for Neon Genesis: https://arxiv.org/abs/2303.11331 EVA - CLIP: Improved Training Techniques for CLIP at Scale: https://arxiv.org/abs/2303.15389
Original	https://github.com/baaivision/EVA https://huggingface.co/Yuxin - CV/EVA - 02
Pretrain Dataset	ImageNet - 22k

Model Comparison

Explore the dataset and runtime metrics of this model in timm model results.

model	top1	top5	param_count	img_size
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k	90.054	99.042	305.08	448
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k	89.946	99.01	305.08	448
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.792	98.992	1014.45	560
eva02_large_patch14_448.mim_in22k_ft_in1k	89.626	98.954	305.08	448
eva02_large_patch14_448.mim_m38m_ft_in1k	89.57	98.918	305.08	448
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.56	98.956	1013.01	336
eva_giant_patch14_336.clip_ft_in1k	89.466	98.82	1013.01	336
eva_large_patch14_336.in22k_ft_in22k_in1k	89.214	98.854	304.53	336
eva_giant_patch14_224.clip_ft_in1k	88.882	98.678	1012.56	224
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k	88.692	98.722	87.12	448
eva_large_patch14_336.in22k_ft_in1k	88.652	98.722	304.53	336
eva_large_patch14_196.in22k_ft_in22k_in1k	88.592	98.656	304.14	196
eva02_base_patch14_448.mim_in22k_ft_in1k	88.23	98.564	87.12	448
eva_large_patch14_196.in22k_ft_in1k	87.934	98.504	304.14	196
eva02_small_patch14_336.mim_in22k_ft_in1k	85.74	97.614	22.13	336
eva02_tiny_patch14_336.mim_in22k_ft_in1k	80.658	95.524	5.76	336

🔧 Technical Details

timm checkpoints are float32 for consistency with other models. Original checkpoints are float16 or bfloat16 in some cases, see originals if that's preferred.

📄 License

This project is licensed under the MIT License.

📚 Citation

@article{EVA02,
  title={EVA-02: A Visual Representation for Neon Genesis},
  author={Fang, Yuxin and Sun, Quan and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
  journal={arXiv preprint arXiv:2303.11331},
  year={2023}
}

@article{EVA-CLIP,
  title={EVA-02: A Visual Representation for Neon Genesis},
  author={Sun, Quan and Fang, Yuxin and Wu, Ledell and Wang, Xinlong and Cao, Yue},
  journal={arXiv preprint arXiv:2303.15389},
  year={2023}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご