vit_base_patch32_224.orig_in21k Open-source Image Classification Model - Support Feature Extraction and Model Fine-tuning

Vit Base Patch32 224.orig In21k

Developed by timm

An image classification model based on Vision Transformer (ViT), pre-trained on ImageNet-21k, suitable for feature extraction and fine-tuning scenarios.

Image Classification

Transformers

Open Source License:Apache-2.0 #ViT backbone network #ImageNet21k pre-trained #Feature extraction without classification head

Downloads 438

Release Time : 11/17/2023

Model Overview

This model is an image classification model based on the Vision Transformer architecture, pre-trained by the paper authors on the ImageNet-21k dataset using JAX and later ported to PyTorch. The model does not include a classification head, making it suitable for feature extraction and fine-tuning for downstream tasks.

Model Features

Transformer-based architecture

Utilizes the Vision Transformer architecture, dividing images into 32x32 patches for processing, suitable for large-scale image recognition tasks.

Pre-trained weights

Pre-trained on the large-scale ImageNet-21k dataset, offering robust feature extraction capabilities.

Flexible feature extraction

The model does not include a classification head, allowing direct use for feature extraction or fine-tuning for downstream tasks.

Model Capabilities

Image feature extraction

Image classification

Transfer learning

Use Cases

Computer vision

Image classification

Use the pre-trained model for image classification tasks or fine-tune it for domain-specific classifiers.

Feature extraction

Extract high-level image features for downstream tasks such as object detection and image retrieval.

🚀 vit_base_patch32_224.orig_in21k

A Vision Transformer (ViT) image classification model. Pretrained on ImageNet-21k in JAX by paper authors, ported to PyTorch by Ross Wightman. This model does not have a classification head, useful for features and fine-tune only.

🚀 Quick Start

This is a Vision Transformer (ViT) image classification model. It was initially pretrained on ImageNet - 21k using JAX by the paper's authors and then ported to PyTorch by Ross Wightman. Without a classification head, it's mainly for feature extraction and fine - tuning.

✨ Features

Model Type: Image classification / feature backbone
Model Stats:
- Params (M): 87.5
- GMACs: 4.4
- Activations (M): 4.2
- Image size: 224 x 224
Papers:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dataset: ImageNet - 21k
Original: https://github.com/google-research/vision_transformer

Property	Details
Model Type	Image classification / feature backbone
Training Data	ImageNet - 21k

💻 Usage Examples

Basic Usage

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_base_patch32_224.orig_in21k', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Advanced Usage

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_base_patch32_224.orig_in21k',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 50, 768) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 Documentation

Explore the dataset and runtime metrics of this model in timm model results.

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citation

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご