V

Vit Large Patch16 224 In21k

Developed by google
A Vision Transformer model pretrained on the ImageNet-21k dataset, suitable for image feature extraction and downstream task fine-tuning.
Downloads 92.63k
Release Time : 3/2/2022

Model Overview

This model is a Transformer encoder similar to BERT, pretrained on the ImageNet-21k dataset through supervised learning, primarily used for image feature extraction and classification tasks.

Model Features

Pretrained on ImageNet-21k
Pretrained on the ImageNet-21k dataset containing 14 million images and 21,843 categories, with strong feature extraction capabilities.
16x16 image patch segmentation
Divides images into fixed-size 16x16 pixel patches and inputs them into the Transformer encoder via linear embeddings.
Includes pretrained pooler
The model includes a pretrained pooler that can be directly used for feature extraction in downstream tasks without training from scratch.

Model Capabilities

Image feature extraction
Image classification
Downstream task fine-tuning

Use Cases

Computer vision
Image classification
Add a linear layer on top of the pretrained model for specific image classification tasks.
Performs excellently on benchmark datasets like ImageNet.
Feature extraction
Extracts image feature representations for other vision tasks such as object detection and image segmentation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase