OneFormer_ADE20K_DiNat_Large Open-source Image Segmentation Model - One Model to Handle Semantic, Instance, and Panoptic Segmentation

Oneformer Ade20k Dinat Large

Developed by shi-labs

The first multi-task universal image segmentation framework supporting semantic/instance/panoptic segmentation with a single model

Image Segmentation

Transformers

Open Source License:MIT #Multi-task Image Segmentation #Universal Transformer Architecture #Task Dynamic Adaptation

Downloads 2,275

Release Time : 11/15/2022

Model Overview

OneFormer is a Transformer-based universal image segmentation model that achieves semantic segmentation, instance segmentation, and panoptic segmentation through a single architecture and training process, trained on the ADE20k dataset.

Model Features

Unified Multi-task Architecture

A single model simultaneously supports semantic segmentation, instance segmentation, and panoptic segmentation

Dynamic Task Adaptation

Implements task guidance during training and dynamic task switching during inference through task token mechanism

Outperforms Specialized Models

Surpasses performance of specially designed single-task models across multiple segmentation tasks

Model Capabilities

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Scene Parsing

Object Recognition

Use Cases

Computer Vision

Scene Understanding

Pixel-level semantic parsing of indoor/outdoor scenes

Can recognize 150 categories of scene elements (based on ADE20k dataset)

Object Instance Segmentation

Identify and segment individual object instances in images

Capable of handling overlapping objects in complex scenes

Autonomous Driving

Road Scene Parsing

Real-time segmentation of road elements, vehicles, pedestrians, etc.

Suitable for environmental perception modules in autonomous driving systems

🚀 OneFormer

OneFormer is a model trained on the ADE20k dataset (large - sized version, Dinat backbone). It offers a unified solution for various image segmentation tasks.

🚀 Quick Start

OneFormer model is trained on the ADE20k dataset (large - sized version, Dinat backbone). It was introduced in the paper OneFormer: One Transformer to Rule Universal Image Segmentation by Jain et al. and first released in this repository.

model image

✨ Features

OneFormer is the first multi - task universal image segmentation framework. It needs to be trained only once with a single universal architecture, a single model, and on a single dataset, to outperform existing specialized models across semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task - guided for training, and task - dynamic for inference, all with a single model.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for semantic, instance and panoptic segmentation. See the model hub to look for other fine - tuned versions on a different dataset.

How to use

Here is how to use this model:

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
from PIL import Image
import requests
url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/ade20k.jpeg"
image = Image.open(requests.get(url, stream=True).raw)

# Loading a single model for all three tasks
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_dinat_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_ade20k_dinat_large")

# Semantic Segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

# Instance Segmentation
instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt")
instance_outputs = model(**instance_inputs)
# pass through image_processor for postprocessing
predicted_instance_map = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

# Panoptic Segmentation
panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt")
panoptic_outputs = model(**panoptic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

For more examples, please refer to the documentation.

Citation

@article{jain2022oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={arXiv}, 
      year={2022}
    }

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	OneFormer model trained on the ADE20k dataset (large - sized version, Dinat backbone)
Training Data	ADE20k dataset
Tags	vision, image - segmentation
Widget Examples	- House - Airplane - Person

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご