V

Vit Base Patch16 224.mae

Developed by timm
Vision Transformer (ViT) based image feature extraction model, pre-trained on ImageNet-1k dataset using self-supervised masked autoencoder (MAE) method
Downloads 23.63k
Release Time : 5/9/2023

Model Overview

This is an image feature extraction model based on Vision Transformer architecture, primarily used for image classification and feature extraction tasks. The model is pre-trained using the self-supervised learning method of masked autoencoder (MAE), effectively capturing image features.

Model Features

Self-supervised pre-training
Uses masked autoencoder (MAE) method for self-supervised pre-training, requiring no large amounts of labeled data
Efficient feature extraction
Based on Vision Transformer architecture, capable of effectively extracting image features
Medium-sized model
85.8 million parameters, striking a balance between computational efficiency and performance

Model Capabilities

Image feature extraction
Image classification
Visual representation learning

Use Cases

Computer vision
Image classification
Can be used to classify images, such as identifying object categories
Feature extraction
Can serve as a feature extractor for other vision tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase