SegFormer-b4 Open-Source Image Segmentation Model - Free Deployment for Precise Image Segmentation Tasks

Segformer B4 Finetuned Ade 512 512

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20k dataset, suitable for image segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #Semantic Segmentation #Transformer Architecture #ADE20k Dataset

Downloads 15.61k

Release Time : 3/2/2022

Model Overview

This model features a hierarchical Transformer encoder and a lightweight all-MLP decoder head, delivering outstanding performance in semantic segmentation tasks.

Model Features

Hierarchical Transformer Architecture

Employs a hierarchical Transformer encoder to effectively capture multi-scale features.

Lightweight Decoder Head

Uses an all-MLP decoder head design to maintain high performance while reducing computational load.

512x512 Resolution Support

The model is fine-tuned at 512x512 resolution, making it suitable for high-resolution image processing.

Model Capabilities

Image Semantic Segmentation

Scene Understanding

Pixel-level Classification

Use Cases

Scene Understanding

House Scene Parsing

Perform semantic segmentation on house images to identify different architectural elements.

Castle Scene Parsing

Perform semantic segmentation on castle images to identify architectural structures and environmental elements.

🚀 SegFormer (b4-sized) model fine-tuned on ADE20k

A SegFormer model fine-tuned on ADE20k at 512x512 resolution, designed for efficient semantic segmentation.

🚀 Quick Start

The SegFormer model presented here is fine-tuned on the ADE20k dataset at a resolution of 512x512. It was first introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and initially released in this repository.

Disclaimer: The team that released SegFormer did not create a model card for this model. This model card was written by the Hugging Face team.

✨ Features

SegFormer combines a hierarchical Transformer encoder with a lightweight all - MLP decode head, achieving excellent results on semantic segmentation benchmarks like ADE20K and Cityscapes. The hierarchical Transformer is pre - trained on ImageNet - 1k, and then a decode head is added and fine - tuned on a downstream dataset.

💻 Usage Examples

Basic Usage

Here is an example of using this model to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b4-finetuned-ade-512-512")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b4-finetuned-ade-512-512")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, refer to the documentation.

📚 Documentation

You can use the raw model for semantic segmentation. Check the model hub to find fine - tuned versions for tasks that interest you.

📄 License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	Vision, Image - Segmentation
Training Data	scene_parse_150

💡 Usage Tip

You can find fine - tuned versions of the model on the model hub according to your specific needs.

The following are some example images for quick testing:

House
Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご