Vit Large Patch16 224.orig In21k
A Vision Transformer (ViT) based image classification model, pretrained on ImageNet-21k by Google Research using JAX framework and later ported to PyTorch. Suitable for feature extraction and fine-tuning scenarios.
Downloads 584
Release Time : 11/17/2023
Model Overview
This is a large Vision Transformer model specifically designed for image classification and feature extraction. Pretrained on the ImageNet-21k dataset without a classification head, it is ideal as a backbone network for fine-tuning downstream tasks.
Model Features
Large-scale pretraining
Pretrained on the large-scale ImageNet-21k dataset, offering powerful feature extraction capabilities
Pure Transformer architecture
Fully based on Transformer architecture without convolutional operations, suitable for processing global image information
Flexible feature extraction
Capable of outputting feature representations at different levels, including both pooled and unpooled sequence features
Efficient computation
Maintains reasonable computational load (59.7 GMACs) despite relatively large model size
Model Capabilities
Image feature extraction
Image classification
Transfer learning
Computer vision tasks
Use Cases
Computer vision
Image classification
Used as a backbone network for image classification tasks, adaptable to specific classification needs through fine-tuning
Feature extraction
Extracts high-level image features for downstream tasks such as object detection and image segmentation
Transfer learning
Utilizes pretrained weights as a starting point for fine-tuning on smaller datasets
Featured Recommended AI Models
Š 2025AIbase