V

Vit Base Patch16 224.orig In21k

Developed by timm
An image classification model based on Vision Transformer, pretrained on ImageNet-21k, suitable for feature extraction and fine-tuning
Downloads 23.07k
Release Time : 11/16/2023

Model Overview

This is an image classification model based on the Vision Transformer architecture, developed by Google Research and pretrained on the ImageNet-21k dataset. The model does not include a classification head, making it suitable as a feature extraction backbone or for fine-tuning on downstream tasks.

Model Features

Large-Scale Pretraining
Pretrained on the large-scale ImageNet-21k dataset, with powerful feature extraction capabilities
Transformer Architecture
Uses a pure Transformer architecture to process images, dividing them into 16x16 patches for processing
Flexible Application
Can be used as a feature extraction backbone or fine-tuned for downstream tasks, supports removal of classification head

Model Capabilities

Image Feature Extraction
Image Classification
Transfer Learning

Use Cases

Computer Vision
Image Classification
Used for image classification tasks, can fine-tune the model to adapt to specific classification needs
Feature Extraction
Used as a backbone network to extract image features for downstream tasks such as object detection, image segmentation, etc.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase