B

Beit Large Patch16 224 Pt22k Ft22k

Developed by microsoft
BEiT is a Vision Transformer (ViT)-based image classification model, pre-trained in a self-supervised manner on ImageNet-22k and fine-tuned on the same dataset.
Downloads 1,880
Release Time : 3/2/2022

Model Overview

The BEiT model is a Vision Transformer (ViT) pre-trained in a self-supervised manner on ImageNet-22k and fine-tuned on the same dataset, primarily used for image classification tasks.

Model Features

Self-supervised Pre-training
The model undergoes self-supervised pre-training by predicting visual tokens from masked image patches, learning intrinsic representations of images.
Relative Position Embedding
Uses relative position embeddings instead of absolute position embeddings to enhance the model's understanding of image structures.
Large-scale Dataset Training
Pre-trained and fine-tuned on ImageNet-22k (14 million images, 21,841 categories).

Model Capabilities

Image Classification
Feature Extraction

Use Cases

Image Classification
ImageNet Classification
Classify images into one of the 21,841 ImageNet-22k categories.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase