Vit Large Patch16 224.mae
V
Vit Large Patch16 224.mae
Developed by timm
Large-scale image feature extraction model based on Vision Transformer (ViT), pre-trained on ImageNet-1k dataset using self-supervised Masked Autoencoder (MAE) method
Downloads 960
Release Time : 5/9/2023
Model Overview
This model is a large-scale image feature extraction model based on the Vision Transformer architecture, primarily used for image classification and feature extraction tasks. It was pre-trained on the ImageNet-1k dataset using the self-supervised learning method of Masked Autoencoder (MAE).
Model Features
Self-supervised pre-training
Uses Masked Autoencoder (MAE) method for self-supervised pre-training, enabling effective feature representation learning without extensive labeled data
Large-scale Vision Transformer
Based on ViT-Large architecture with 303.3M parameters, capable of capturing rich visual features
Efficient feature extraction
Supports extraction of global image features or local patch features, suitable for various downstream vision tasks
Model Capabilities
Image classification
Image feature extraction
Visual representation learning
Use Cases
Computer vision
Image classification
Can be used for image classification, supporting 1000-class ImageNet classification tasks
Feature extraction
Can serve as a feature extractor for downstream vision tasks such as object detection, image segmentation, etc.
Featured Recommended AI Models
Š 2025AIbase