OneFormer_Cityscapes_DiNat_Large Open-Source Image Segmentation Model - Supports Multi-Type Urban Image Segmentation Tasks

Oneformer Cityscapes Dinat Large

Developed by shi-labs

A multi-task universal image segmentation model trained on the Cityscapes dataset, supporting semantic segmentation, instance segmentation, and panoptic segmentation tasks

Image Segmentation

Transformers

Open Source License:MIT #Multi-task Segmentation #Unified Transformer #Urban Scene Analysis

Downloads 70.19k

Release Time : 11/15/2022

Model Overview

OneFormer is the first unified Transformer model for image segmentation, achieving three segmentation tasks through a single architecture and model, utilizing task token mechanism for task-conditioned processing

Model Features

Unified Multi-task Architecture

A single model simultaneously supports three tasks: semantic segmentation, instance segmentation, and panoptic segmentation

Task Token Mechanism

Implements task guidance during training and dynamic task adjustment during inference through task tokens

Surpasses Specialized Models

Outperforms specialized models in all three segmentation tasks

Model Capabilities

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Urban Scene Analysis

Use Cases

Intelligent Transportation

Road Scene Understanding

Performs pixel-level semantic segmentation of urban road scenes

Accurately identifies elements such as roads, vehicles, and pedestrians

Urban Planning

Urban Landscape Analysis

Performs instance segmentation of urban buildings and infrastructure

Can count and analyze the distribution of various urban elements

🚀 OneFormer

OneFormer is a model trained on the Cityscapes dataset (large - sized version, Dinat backbone). It offers a unified solution for multiple image segmentation tasks.

🚀 Quick Start

OneFormer is a powerful model for image segmentation. You can use it for semantic, instance, and panoptic segmentation tasks. For other fine - tuned versions on different datasets, check the model hub.

✨ Features

Multi - task Universal Segmentation: OneFormer is the first multi - task universal image segmentation framework. Trained once with a single architecture, model, and dataset, it outperforms existing specialized models in semantic, instance, and panoptic segmentation tasks.
Task - guided and Task - dynamic: It uses a task token to condition the model on the task, making the architecture task - guided during training and task - dynamic during inference, all with a single model.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
from PIL import Image
import requests
url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/cityscapes.png"
image = Image.open(requests.get(url, stream=True).raw)

# Loading a single model for all three tasks
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_cityscapes_dinat_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_cityscapes_dinat_large")

# Semantic Segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

# Instance Segmentation
instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt")
instance_outputs = model(**instance_inputs)
# pass through image_processor for postprocessing
predicted_instance_map = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

# Panoptic Segmentation
panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt")
panoptic_outputs = model(**panoptic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

For more examples, please refer to the documentation.

📚 Documentation

Model description

OneFormer is the first multi - task universal image segmentation framework. It needs to be trained only once with a single universal architecture, a single model, and on a single dataset, to outperform existing specialized models across semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task - guided for training, and task - dynamic for inference, all with a single model.

model image

Intended uses & limitations

You can use this particular checkpoint for semantic, instance and panoptic segmentation. See the model hub to look for other fine - tuned versions on a different dataset.

Citation

@article{jain2022oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={arXiv}, 
      year={2022}
    }

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	OneFormer trained on Cityscapes dataset (large - sized version, Dinat backbone)
Training Data	huggan/cityscapes

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご