V

Vit Huge Patch14 224.mae

Developed by timm
A large-scale image feature extraction model based on Vision Transformer (ViT), pre-trained on the ImageNet-1k dataset using the self-supervised masked autoencoder (MAE) method
Downloads 104
Release Time : 5/9/2023

Model Overview

This is an image feature extraction model based on the Vision Transformer architecture, primarily used for image classification and feature extraction tasks. The model employs the self-supervised learning approach of masked autoencoder (MAE) for pre-training, effectively capturing high-level feature representations of images.

Model Features

Large-scale Vision Transformer
Utilizes the ViT-Huge architecture with 630 million parameters, capable of handling complex visual features
Self-supervised Pre-training
Pre-trained using the masked autoencoder (MAE) method, eliminating the need for extensive labeled data
High-resolution Processing
Supports 224×224 pixel image input, enabling the capture of finer visual features

Model Capabilities

Image Feature Extraction
Image Classification
Visual Representation Learning

Use Cases

Computer Vision
Image Classification
Can be used to classify image content, such as identifying objects, scenes, etc.
Feature Extraction
Can serve as a feature extractor to provide high-quality image representations for downstream visual tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase