B

Beit Large Finetuned Ade 640 640

Developed by microsoft
BEiT is an image segmentation model based on Vision Transformer architecture, achieving efficient semantic segmentation through self-supervised pre-training and fine-tuning on the ADE20k dataset.
Downloads 14.97k
Release Time : 3/2/2022

Model Overview

This model adopts a BERT-like Transformer encoder architecture, specifically designed for image semantic segmentation tasks, demonstrating excellent performance on benchmark datasets like ADE20k.

Model Features

Self-supervised pre-training
Pre-trained on ImageNet-21k via masked image patch prediction tasks to learn general visual representations
High-resolution fine-tuning
Fine-tuned on the ADE20k dataset at 640x640 resolution to adapt to semantic segmentation tasks
Relative position encoding
Uses T5-style relative position encoding instead of absolute position encoding to enhance positional awareness

Model Capabilities

Image semantic segmentation
Scene understanding
Visual feature extraction

Use Cases

Computer vision
Architectural scene parsing
Performs pixel-level semantic segmentation of architectural scenes like houses and castles
Achieves SOTA performance on the ADE20k dataset
Urban landscape analysis
Identifies urban elements such as roads, buildings, and vegetation
Demonstrates excellent performance on benchmark tests like CityScapes
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase