V

Vit Base Patch32 224.orig In21k

Developed by timm
An image classification model based on Vision Transformer (ViT), pre-trained on ImageNet-21k, suitable for feature extraction and fine-tuning scenarios.
Downloads 438
Release Time : 11/17/2023

Model Overview

This model is an image classification model based on the Vision Transformer architecture, pre-trained by the paper authors on the ImageNet-21k dataset using JAX and later ported to PyTorch. The model does not include a classification head, making it suitable for feature extraction and fine-tuning for downstream tasks.

Model Features

Transformer-based architecture
Utilizes the Vision Transformer architecture, dividing images into 32x32 patches for processing, suitable for large-scale image recognition tasks.
Pre-trained weights
Pre-trained on the large-scale ImageNet-21k dataset, offering robust feature extraction capabilities.
Flexible feature extraction
The model does not include a classification head, allowing direct use for feature extraction or fine-tuning for downstream tasks.

Model Capabilities

Image feature extraction
Image classification
Transfer learning

Use Cases

Computer vision
Image classification
Use the pre-trained model for image classification tasks or fine-tune it for domain-specific classifiers.
Feature extraction
Extract high-level image features for downstream tasks such as object detection and image retrieval.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase