đ Model Card for eva02_tiny_patch14_224.mim_in22k
This is an EVA02 feature / representation model. It was pretrained on ImageNet-22k with masked image modeling (using EVA-CLIP as a MIM teacher) by the paper authors.
EVA-02 models are vision transformers that incorporate mean pooling, SwiGLU, Rotary Position Embeddings (ROPE), and an extra LN in MLP (for Base & Large).
â ī¸ Important Note
timm
checkpoints are in float32 for consistency with other models. In some cases, the original checkpoints are in float16 or bfloat16. Refer to the originals if that's your preference.
đ Quick Start
Image Classification
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('eva02_tiny_patch14_224.mim_in22k', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
Image Embeddings
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'eva02_tiny_patch14_224.mim_in22k',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
⨠Features
- An EVA02 feature / representation model.
- Pretrained on ImageNet-22k with masked image modeling.
- Vision transformers with mean pooling, SwiGLU, Rotary Position Embeddings (ROPE), and extra LN in MLP (for Base & Large).
đ Documentation
Model Details
Property |
Details |
Model Type |
Image classification / feature backbone |
Params (M) |
5.5 |
GMACs |
1.7 |
Activations (M) |
9.1 |
Image size |
224 x 224 |
Papers |
- EVA-02: A Visual Representation for Neon Genesis: https://arxiv.org/abs/2303.11331 - EVA-CLIP: Improved Training Techniques for CLIP at Scale: https://arxiv.org/abs/2303.15389 |
Original |
- https://github.com/baaivision/EVA - https://huggingface.co/Yuxin-CV/EVA-02 |
Pretrain Dataset |
ImageNet-22k |
Model Comparison
Explore the dataset and runtime metrics of this model in timm model results.
model |
top1 |
top5 |
param_count |
img_size |
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k |
90.054 |
99.042 |
305.08 |
448 |
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k |
89.946 |
99.01 |
305.08 |
448 |
eva_giant_patch14_560.m30m_ft_in22k_in1k |
89.792 |
98.992 |
1014.45 |
560 |
eva02_large_patch14_448.mim_in22k_ft_in1k |
89.626 |
98.954 |
305.08 |
448 |
eva02_large_patch14_448.mim_m38m_ft_in1k |
89.57 |
98.918 |
305.08 |
448 |
eva_giant_patch14_336.m30m_ft_in22k_in1k |
89.56 |
98.956 |
1013.01 |
336 |
eva_giant_patch14_336.clip_ft_in1k |
89.466 |
98.82 |
1013.01 |
336 |
eva_large_patch14_336.in22k_ft_in22k_in1k |
89.214 |
98.854 |
304.53 |
336 |
eva_giant_patch14_224.clip_ft_in1k |
88.882 |
98.678 |
1012.56 |
224 |
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k |
88.692 |
98.722 |
87.12 |
448 |
eva_large_patch14_336.in22k_ft_in1k |
88.652 |
98.722 |
304.53 |
336 |
eva_large_patch14_196.in22k_ft_in22k_in1k |
88.592 |
98.656 |
304.14 |
196 |
eva02_base_patch14_448.mim_in22k_ft_in1k |
88.23 |
98.564 |
87.12 |
448 |
eva_large_patch14_196.in22k_ft_in1k |
87.934 |
98.504 |
304.14 |
196 |
eva02_small_patch14_336.mim_in22k_ft_in1k |
85.74 |
97.614 |
22.13 |
336 |
eva02_tiny_patch14_336.mim_in22k_ft_in1k |
80.658 |
95.524 |
5.76 |
336 |
đ License
This project is licensed under the MIT license.