B

Beit Base Patch16 224 Pt22k Ft22k

Developed by microsoft
BEiT is a Vision Transformer (ViT)-based image classification model, pre-trained in a self-supervised manner on ImageNet-22k and fine-tuned on the same dataset.
Downloads 546.85k
Release Time : 3/2/2022

Model Overview

The BEiT model is a Vision Transformer that is pre-trained in a self-supervised manner on ImageNet-22k and fine-tuned for image classification tasks.

Model Features

Self-supervised Pre-training
Pre-trained using masked image patch prediction of visual tokens to learn intrinsic image representations.
Relative Position Embeddings
Employs relative position embeddings (similar to T5) instead of absolute position embeddings to enhance model performance.
Average Pooling Classification
Classifies by averaging the final hidden states of image patches rather than relying on a [CLS] token.

Model Capabilities

Image Classification
Feature Extraction

Use Cases

Image Classification
ImageNet Classification
Classifies images into one of the 21,841 categories in ImageNet-22k.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase