OneFormer_ADE20K_Swin_Tiny Open-Source Image Segmentation Framework - A Single Model for Semantic/Instance/Panoptic Segmentation

Oneformer Ade20k Swin Tiny

Developed by shi-labs

The first multi-task universal image segmentation framework, supporting semantic/instance/panoptic segmentation tasks with a single model

Image Segmentation

Transformers

Open Source License:MIT #Multi-task segmentation #Universal image segmentation #Dynamic task switching

Downloads 12.96k

Release Time : 11/16/2022

Model Overview

Based on the tiny version of Swin Transformer backbone, achieving dynamic task switching through task token mechanism, trained on ADE20k dataset

Model Features

Unified multi-task architecture

Single model supports semantic segmentation, instance segmentation, and panoptic segmentation tasks simultaneously

Dynamic task switching

Achieves task-guided training and dynamic task switching during inference through task token mechanism

Outperforms specialized models

Surpasses dedicated single-task models in multiple segmentation tasks

Model Capabilities

Semantic segmentation

Instance segmentation

Panoptic segmentation

Image scene parsing

Use Cases

Scene understanding

House scene parsing

Identifying 150 elements in architectural scenes such as roofs, windows, doors

Example images show precise segmentation of house structures

Traffic scene analysis

Detecting vehicles, pedestrians, traffic signs in road scenes

Example images demonstrate instance segmentation of moving objects like airplanes

Human-computer interaction

Human segmentation

Precisely separating people from background

Example images show fine segmentation of human contours

🚀 OneFormer

OneFormer is a model trained on the ADE20k dataset (tiny-sized version, Swin backbone). It addresses the challenges in universal image segmentation, offering a unified solution for multiple segmentation tasks.

🚀 Quick Start

OneFormer is a remarkable model trained on the ADE20k dataset, specifically the tiny - sized version with a Swin backbone. It was first introduced in the paper OneFormer: One Transformer to Rule Universal Image Segmentation by Jain et al. and initially released in this repository.

model image

✨ Features

Model description

OneFormer stands out as the first multi - task universal image segmentation framework. With a single universal architecture, a single model, and training on a single dataset just once, it outperforms existing specialized models across semantic, instance, and panoptic segmentation tasks. It uses a task token to condition the model on the task at hand, making the architecture task - guided during training and task - dynamic during inference, all within a single model.

model image

Intended uses & limitations

You can utilize this specific checkpoint for semantic, instance, and panoptic segmentation. To find other fine - tuned versions on different datasets, check out the model hub.

💻 Usage Examples

Basic Usage

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
from PIL import Image
import requests
url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/ade20k.jpeg"
image = Image.open(requests.get(url, stream=True).raw)

# Loading a single model for all three tasks
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_tiny")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_ade20k_swin_tiny")

# Semantic Segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

# Instance Segmentation
instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt")
instance_outputs = model(**instance_inputs)
# pass through image_processor for postprocessing
predicted_instance_map = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

# Panoptic Segmentation
panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt")
panoptic_outputs = model(**panoptic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

For more examples, please refer to the documentation.

📚 Documentation

Citation

@article{jain2022oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={arXiv}, 
      year={2022}
    }

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	OneFormer model trained on the ADE20k dataset (tiny - sized version, Swin backbone)
Training Data	scene_parse_150

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご