Open-source OneFormer Model - A Single Model for Semantic, Instance, and Panoptic Image Segmentation Tasks

Oneformer Ade20k Swin Large

Developed by shi-labs

OneFormer is the first multi-task universal image segmentation framework that supports semantic segmentation, instance segmentation, and panoptic segmentation tasks with a single model.

Image Segmentation

Transformers

Open Source License:MIT #Multi-task segmentation #Unified Transformer architecture #ADE20k dataset

Downloads 141.57k

Release Time : 11/15/2022

Model Overview

A universal image segmentation model based on Swin backbone network, trained on the ADE20k dataset, capable of dynamically switching segmentation task types through task tokens.

Model Features

Unified multi-task architecture

A single model simultaneously supports three tasks: semantic segmentation, instance segmentation, and panoptic segmentation

Task-conditioned processing

Implements task guidance during training and dynamic task switching during inference through task tokens

Outperforms specialized models

Achieves better performance than specially designed single-task models on multiple segmentation tasks

Model Capabilities

Semantic segmentation

Instance segmentation

Panoptic segmentation

Universal image analysis

Use Cases

Scene understanding

Indoor scene parsing

Identify elements such as walls, furniture, and appliances in house images

Example images demonstrate complete scene segmentation effects

Outdoor scene analysis

Parse objects like buildings, vehicles, and pedestrians in street scenes

Object recognition

Vehicle identification

Precisely segment vehicles such as airplanes and cars in images

Example images demonstrate instance segmentation effects of airplanes

Person segmentation

Separate human figures from complex backgrounds

Example images demonstrate person segmentation effects

🚀 OneFormer

OneFormer is a model trained on the ADE20k dataset (large-sized version, Swin backbone). It was introduced in the paper OneFormer: One Transformer to Rule Universal Image Segmentation by Jain et al. and first released in this repository. This model offers a unified solution for various image segmentation tasks.

model image

✨ Features

Universal Segmentation: OneFormer is the first multi-task universal image segmentation framework. It can outperform existing specialized models across semantic, instance, and panoptic segmentation tasks with a single universal architecture, a single model, and a single training on a dataset.
Task - Guided Architecture: It uses a task token to condition the model on the task in focus, making the architecture task - guided for training and task - dynamic for inference with a single model.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for semantic, instance and panoptic segmentation. See the model hub to look for other fine - tuned versions on a different dataset.

How to use

Here is how to use this model:

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
from PIL import Image
import requests
url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/ade20k.jpeg"
image = Image.open(requests.get(url, stream=True).raw)

# Loading a single model for all three tasks
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_ade20k_swin_large")

# Semantic Segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

# Instance Segmentation
instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt")
instance_outputs = model(**instance_inputs)
# pass through image_processor for postprocessing
predicted_instance_map = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

# Panoptic Segmentation
panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt")
panoptic_outputs = model(**panoptic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

For more examples, please refer to the documentation.

Citation

@article{jain2022oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={arXiv}, 
      year={2022}
    }

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	OneFormer model trained on the ADE20k dataset (large - sized version, Swin backbone)
Training Data	ADE20k dataset
Tags	vision, image - segmentation, universal - image - segmentation
Datasets	scene_parse_150
Widget Examples	House: https://praeclarumjj3.github.io/files/ade20k.jpeg Airplane: https://praeclarumjj3.github.io/files/demo_2.jpg Person: https://praeclarumjj3.github.io/files/coco.jpeg

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご