B

Beit Base Patch16 224 Pt22k

Developed by microsoft
BEiT is a vision Transformer-based model pre-trained on the ImageNet-21k dataset through self-supervised learning for image classification tasks.
Downloads 2,647
Release Time : 3/2/2022

Model Overview

The BEiT model is a vision Transformer (ViT) pre-trained on the ImageNet-21k dataset in a self-supervised manner, primarily for image classification tasks.

Model Features

Self-supervised Pre-training
The model is pre-trained on the ImageNet-21k dataset in a self-supervised manner to learn intrinsic representations of images.
Vision Transformer Architecture
It adopts a BERT-like Transformer encoder model, using relative position embeddings instead of absolute position embeddings.
Masked Image Patch Prediction
The pre-training objective is based on predicting masked image patches using visual tokens generated by OpenAI's DALL-E VQ-VAE encoder.

Model Capabilities

Image Classification
Feature Extraction

Use Cases

Computer Vision
Image Classification
Use the pre-trained model for image classification tasks.
Feature Extraction
Extract image features for downstream tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase