OneFormer_Cityscapes_Swin_Large Open-source Image Segmentation Model

Oneformer Cityscapes Swin Large

Developed by shi-labs

The first multi-task universal image segmentation framework, supporting semantic/instance/panoptic segmentation tasks with a single model

Image Segmentation

Transformers

Open Source License:MIT #Multi-task Image Segmentation #Unified Transformer Architecture #Urban Scene Analysis

Downloads 1,784

Release Time : 11/15/2022

Model Overview

A unified image segmentation model based on Swin backbone network, enabling dynamic task switching through task tokens, trained on the Cityscapes dataset

Model Features

Unified Multi-task Architecture

A single model simultaneously supports semantic segmentation, instance segmentation, and panoptic segmentation tasks

Dynamic Task Switching

Task switching during inference via task tokens without retraining dedicated models

Outperforms Dedicated Models

Surpasses traditional dedicated models in performance across all three segmentation tasks

Model Capabilities

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Street Scene Parsing

Use Cases

Autonomous Driving

Road Scene Understanding

Identifying elements such as vehicles, pedestrians, and traffic signs in urban road scenes

Provides pixel-level semantic labels and instance boundaries

Geographic Information Systems

Aerial Image Analysis

Segmenting features like buildings and roads in satellite/aerial images

Generates quantifiable geographic data

🚀 OneFormer

OneFormer is a model trained on the Cityscapes dataset (large - sized version, Swin backbone). It offers a unified solution for various image segmentation tasks.

🚀 Quick Start

OneFormer is a remarkable model for image segmentation. You can utilize it for semantic, instance, and panoptic segmentation tasks.

✨ Features

Multi - task Universal Segmentation: OneFormer is the first multi - task universal image segmentation framework. With a single universal architecture, model, and dataset training, it outperforms existing specialized models in semantic, instance, and panoptic segmentation tasks.
Task - Guided and Task - Dynamic: It uses a task token to condition the model on the task in focus, enabling task - guided training and task - dynamic inference with a single model.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
from PIL import Image
import requests
url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/cityscapes.png"
image = Image.open(requests.get(url, stream=True).raw)

# Loading a single model for all three tasks
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_cityscapes_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_cityscapes_swin_large")

# Semantic Segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

# Instance Segmentation
instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt")
instance_outputs = model(**instance_inputs)
# pass through image_processor for postprocessing
predicted_instance_map = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

# Panoptic Segmentation
panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt")
panoptic_outputs = model(**panoptic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

For more examples, please refer to the documentation.

📚 Documentation

Model description

OneFormer is the first multi - task universal image segmentation framework. It needs to be trained only once with a single universal architecture, a single model, and on a single dataset, to outperform existing specialized models across semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task - guided for training, and task - dynamic for inference, all with a single model.

model image

Intended uses & limitations

You can use this particular checkpoint for semantic, instance and panoptic segmentation. See the model hub to look for other fine - tuned versions on a different dataset.

🔧 Technical Details

No specific technical details beyond the model description are provided in the original document, so this section is skipped.

📄 License

This project is licensed under the MIT license.

Citation

@article{jain2022oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={arXiv}, 
      year={2022}
    }

Property	Details
Model Type	OneFormer model trained on the Cityscapes dataset (large - sized version, Swin backbone)
Training Data	huggan/cityscapes

model image

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご