V

Vit Large Patch32 224.orig In21k

Developed by timm
An image classification model based on Vision Transformer (ViT) architecture, pretrained on the ImageNet-21k dataset, suitable for feature extraction and fine-tuning scenarios.
Downloads 771
Release Time : 12/22/2022

Model Overview

This model is a large Vision Transformer (ViT) model developed by Google Research, primarily used for image classification and feature extraction tasks. It does not include a classification head, making it suitable as a backbone network for fine-tuning or feature extraction.

Model Features

Large-Scale Pretraining
Pretrained on the ImageNet-21k dataset, with powerful feature extraction capabilities
Transformer Architecture
Uses a pure Transformer architecture for image processing, independent of traditional CNN structures
High Compatibility
Migrated from the JAX framework to the PyTorch platform for easy use within the PyTorch ecosystem
Flexible Application
Can be used as a feature extractor or fine-tuned base model, supports removal of the classification head

Model Capabilities

Image Feature Extraction
Image Classification
Transfer Learning
Computer Vision Tasks

Use Cases

Image Classification
General Image Classification
Classifies and recognizes various types of images
Pretrained on the ImageNet-21k dataset, with broad category recognition capabilities
Feature Extraction
Downstream Task Feature Extraction
Provides high-quality image features for other computer vision tasks
Can generate 1024-dimensional feature vectors, suitable for various downstream tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase