Vit Base Patch32 224.orig In21k
An image classification model based on Vision Transformer (ViT), pre-trained on ImageNet-21k, suitable for feature extraction and fine-tuning scenarios.
Downloads 438
Release Time : 11/17/2023
Model Overview
This model is an image classification model based on the Vision Transformer architecture, pre-trained by the paper authors on the ImageNet-21k dataset using JAX and later ported to PyTorch. The model does not include a classification head, making it suitable for feature extraction and fine-tuning for downstream tasks.
Model Features
Transformer-based architecture
Utilizes the Vision Transformer architecture, dividing images into 32x32 patches for processing, suitable for large-scale image recognition tasks.
Pre-trained weights
Pre-trained on the large-scale ImageNet-21k dataset, offering robust feature extraction capabilities.
Flexible feature extraction
The model does not include a classification head, allowing direct use for feature extraction or fine-tuning for downstream tasks.
Model Capabilities
Image feature extraction
Image classification
Transfer learning
Use Cases
Computer vision
Image classification
Use the pre-trained model for image classification tasks or fine-tune it for domain-specific classifiers.
Feature extraction
Extract high-level image features for downstream tasks such as object detection and image retrieval.
Featured Recommended AI Models
Š 2025AIbase