mit-b5-finetuned-sidewalk-semantic Open Source Model - A Free Tool for Semantic Segmentation Tasks

Home

Mit B5 Finetuned Sidewalk Semantic

Developed by zoheb

This SegFormer model has been fine-tuned on the SegmentsAI sidewalk-semantic dataset for semantic segmentation tasks.

Image Segmentation

Transformers

#Sidewalk Semantic Segmentation #Transformer Architecture #High-Precision Segmentation

Downloads 14

Release Time : 10/8/2022

Model Overview

SegFormer is a Transformer-based semantic segmentation model featuring a hierarchical Transformer encoder and a lightweight all-MLP decoder head, designed for sidewalk scene semantic segmentation.

Model Features

Hierarchical Transformer Encoder

Employs a hierarchical Transformer structure to effectively capture multi-scale feature information.

Lightweight MLP Decoder Head

Uses an all-MLP decoder head to maintain efficiency while achieving accurate semantic segmentation.

Sidewalk Scene Optimization

Fine-tuned on the sidewalk-semantic dataset, specifically optimized for sidewalk scene semantic segmentation tasks.

Model Capabilities

Image Semantic Segmentation

Sidewalk Scene Recognition

Multi-Class Classification

Use Cases

Urban Infrastructure

Sidewalk Analysis

Used to identify and analyze different components of urban sidewalks.

Can accurately segment 35 categories including sidewalks, roads, and obstacles.

Autonomous Driving

Road Scene Understanding

Assists autonomous driving systems in understanding sidewalk and road environments.

🚀 SegFormer (b5-sized) Model

A SegFormer (b5-sized) model fine-tuned on the sidewalk-semantic dataset for image segmentation.

🚀 Quick Start

This SegFormer model is fine-tuned on SegmentsAI's sidewalk-semantic dataset. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

✨ Features

Hierarchical Transformer Encoder: SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head, which can achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes.
Pre - training and Fine - tuning: The hierarchical Transformer is first pre - trained on ImageNet - 1k, and then a decode head is added and fine - tuned altogether on a downstream dataset.

💻 Usage Examples

Basic Usage

from transformers import SegformerFeatureExtractor, SegformerForImageClassification
from PIL import Image
import requests

url = "https://segmentsai-prod.s3.eu-west-2.amazonaws.com/assets/admin-tobias/439f6843-80c5-47ce-9b17-0b2a1d54dbeb.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = SegformerFeatureExtractor.from_pretrained("zoheb/mit-b5-finetuned-sidewalk-semantic")
model = SegformerForImageClassification.from_pretrained("zoheb/mit-b5-finetuned-sidewalk-semantic")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits

# model predicts one of the 35 Sidewalk classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

Advanced Usage

You can go through its detailed notebook here. For more code examples, refer to the documentation.

📚 Documentation

Model description

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

📄 License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Information Table

Property	Details
Tags	vision, image - segmentation
Datasets	segments/sidewalk - semantic
Finetuned From	nvidia/mit - b5
Widget Example	Brugge

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご