B

Beit Base Patch16 224

Developed by microsoft
BEiT is a Vision Transformer-based model pre-trained on ImageNet-21k through self-supervised learning and fine-tuned on ImageNet-1k for image classification tasks.
Downloads 58.34k
Release Time : 3/2/2022

Model Overview

The BEiT model adopts a BERT-like Transformer encoder architecture, pre-trained via masked image patch prediction tasks to learn intrinsic image representations, suitable for downstream tasks like image classification.

Model Features

Self-supervised pre-training
Pre-trained on ImageNet-21k via masked image patch prediction tasks to learn general image representations.
Relative position encoding
Uses relative position encoding (similar to T5) instead of absolute position encoding to enhance the model's understanding of image structures.
Average pooling classification
Classifies by averaging the final hidden states of all image patches rather than relying on a single [CLS] token.

Model Capabilities

Image classification
Feature extraction

Use Cases

Computer vision
Image classification
Classify images into one of 1,000 ImageNet categories.
Performs excellently on the ImageNet benchmark.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase