SegFormer-b2 Open-Source Image Segmentation Model - Free Deployment for Image Segmentation Tasks with 512x512 Resolution

Segformer B2 Finetuned Ade 512 512

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20k dataset, suitable for image segmentation tasks at 512x512 resolution.

Image Segmentation

Transformers

Open Source License:Other #Semantic Segmentation #Transformer Architecture #Scene Parsing

Downloads 44.07k

Release Time : 3/2/2022

Model Overview

This model employs a hierarchical Transformer encoder and a lightweight all-MLP decoder head, specifically designed for semantic segmentation tasks, achieving excellent performance on benchmarks like ADE20K.

Model Features

Efficient Architecture Design

Utilizes a hierarchical Transformer encoder and lightweight MLP decoder head to achieve high performance with efficient computation.

ADE20k Optimization

Fine-tuned specifically for the ADE20k dataset, optimizing semantic segmentation performance at 512x512 resolution.

Transformer Advantages

Leverages the Transformer architecture to capture long-range dependencies, improving segmentation accuracy.

Model Capabilities

Image Semantic Segmentation

Scene Understanding

Object Boundary Recognition

Use Cases

Scene Parsing

Architectural Scene Segmentation

Performs semantic segmentation of architectural scenes such as houses and castles

Accurately identifies building structures and environmental elements

Urban Landscape Analysis

Analyzes various elements in urban landscapes

Distinguishes between different categories like roads, buildings, and vegetation

🚀 SegFormer (b2-sized) model fine-tuned on ADE20k

A SegFormer model fine-tuned on ADE20k at 512x512 resolution, designed for efficient semantic segmentation.

🚀 Quick Start

This SegFormer model is fine-tuned on ADE20k at a resolution of 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Hierarchical Transformer Encoder: SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes.
Pre - training and Fine - tuning: The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b2-finetuned-ade-512-512")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b2-finetuned-ade-512-512")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

📚 Documentation

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

The license for this model is other.

Property	Details
Tags	vision, image-segmentation
Datasets	scene_parse_150
Widget Example 1	House
Widget Example 2	Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご