B

Beit Large Patch16 224 Pt22k

Developed by microsoft
BEiT is a self-supervised learning model based on Vision Transformer (ViT), pretrained on the ImageNet-21k dataset for image classification tasks.
Downloads 237
Release Time : 3/2/2022

Model Overview

The BEiT model adopts a BERT-like Transformer encoder structure, pretrained in a self-supervised manner on the ImageNet-21k dataset to learn internal representations of images, which can be used to extract features for downstream tasks.

Model Features

Self-supervised pretraining
Pretrained by predicting visual tokens from masked image patches, requiring no labeled data.
Relative position embeddings
Uses T5-style relative position embeddings instead of absolute ones, enhancing model flexibility.
Patch average pooling
Classifies by averaging the final hidden states of image patches instead of relying on [CLS] tokens.

Model Capabilities

Image classification
Feature extraction

Use Cases

Computer vision
Image classification
Can be used to classify images, identifying objects or scenes within them.
Performs excellently on multiple image classification benchmarks (see original paper for specific data).
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase