B

Beit Large Patch16 512

Developed by microsoft
BEiT is a vision Transformer-based image classification model, pre-trained in a self-supervised manner on ImageNet-21k and fine-tuned on ImageNet-1k.
Downloads 683
Release Time : 3/2/2022

Model Overview

The BEiT model adopts a BERT-like Transformer encoder architecture, pre-trained via masked image modeling, and supports high-resolution image classification tasks.

Model Features

Self-supervised pre-training
Pre-trained on the ImageNet-21k dataset via masked image modeling to learn general image representations.
High-resolution support
Supports 512x512 resolution input, capturing more details compared to standard 224x224 resolution.
Relative position embeddings
Uses T5-like relative position embeddings instead of absolute position embeddings, potentially improving model generalization.

Model Capabilities

Image classification
Feature extraction

Use Cases

Computer vision
General image classification
Classifies images into 1000 ImageNet categories.
Achieves high accuracy on the ImageNet validation set (specific value not provided).
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase