V

Vit Large Patch16 224.mae

Developed by timm
Large-scale image feature extraction model based on Vision Transformer (ViT), pre-trained on ImageNet-1k dataset using self-supervised Masked Autoencoder (MAE) method
Downloads 960
Release Time : 5/9/2023

Model Overview

This model is a large-scale image feature extraction model based on the Vision Transformer architecture, primarily used for image classification and feature extraction tasks. It was pre-trained on the ImageNet-1k dataset using the self-supervised learning method of Masked Autoencoder (MAE).

Model Features

Self-supervised pre-training
Uses Masked Autoencoder (MAE) method for self-supervised pre-training, enabling effective feature representation learning without extensive labeled data
Large-scale Vision Transformer
Based on ViT-Large architecture with 303.3M parameters, capable of capturing rich visual features
Efficient feature extraction
Supports extraction of global image features or local patch features, suitable for various downstream vision tasks

Model Capabilities

Image classification
Image feature extraction
Visual representation learning

Use Cases

Computer vision
Image classification
Can be used for image classification, supporting 1000-class ImageNet classification tasks
Feature extraction
Can serve as a feature extractor for downstream vision tasks such as object detection, image segmentation, etc.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase