B

Beit Base Patch16 384

Developed by microsoft
BEiT is a vision Transformer-based image classification model pretrained in a self-supervised manner on ImageNet-21k and fine-tuned on ImageNet-1k.
Downloads 146
Release Time : 3/2/2022

Model Overview

The BEiT model adopts a BERT-like Transformer encoder architecture, pretrained on large-scale image datasets through self-supervised learning, effectively extracting image features for classification tasks.

Model Features

Self-supervised Pretraining
Pretrained on the ImageNet-21k dataset in a self-supervised manner to learn general image representations.
High-resolution Fine-tuning
Fine-tuned on the ImageNet-1k dataset at 384x384 resolution to enhance classification performance.
Relative Position Encoding
Uses T5-style relative position encoding instead of absolute position encoding to improve the model's understanding of image structure.

Model Capabilities

Image classification
Feature extraction

Use Cases

Computer Vision
Image classification
Classifies input images into one of 1000 ImageNet categories.
Demonstrates excellent performance on ImageNet benchmarks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase