đ SegFormer (b3-sized) model fine-tuned on CityScapes
A SegFormer model fine-tuned on the CityScapes dataset at a resolution of 1024x1024, offering high - performance semantic segmentation.
đ Quick Start
SegFormer is a model fine - tuned on CityScapes at a resolution of 1024x1024. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.
Disclaimer: The team releasing SegFormer did not write a model card for this model, so this model card has been written by the Hugging Face team.
⨠Features
- Powerful Architecture: SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head, achieving great results on semantic segmentation benchmarks such as ADE20K and Cityscapes.
- Pre - training and Fine - tuning: The hierarchical Transformer is first pre - trained on ImageNet - 1k, and then a decode head is added and fine - tuned altogether on a downstream dataset.
đ Documentation
Model description
SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.
Intended uses & limitations
You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.
đģ Usage Examples
Basic Usage
Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests
feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b3-finetuned-cityscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b3-finetuned-cityscapes-1024-1024")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
For more code examples, we refer to the documentation.
BibTeX entry and citation info
@article{DBLP:journals/corr/abs-2105-15203,
author = {Enze Xie and
Wenhai Wang and
Zhiding Yu and
Anima Anandkumar and
Jose M. Alvarez and
Ping Luo},
title = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
Transformers},
journal = {CoRR},
volume = {abs/2105.15203},
year = {2021},
url = {https://arxiv.org/abs/2105.15203},
eprinttype = {arXiv},
eprint = {2105.15203},
timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
đ License
License: other
Property |
Details |
Model Type |
Vision, Image - Segmentation |
Training Data |
Cityscapes |