OneFormer Open-Source Image Segmentation Model - A Single Model Free for Semantic, Instance, and Panoptic Segmentation

Oneformer Coco Swin Large

Developed by shi-labs

OneFormer is the first multi-task universal image segmentation framework, achieving semantic segmentation, instance segmentation, and panoptic segmentation tasks with a single model

Image Segmentation

Transformers

Open Source License:MIT #Unified Image Segmentation #Multi-task Transformer #Swin Backbone Network

Downloads 165.70k

Release Time : 11/15/2022

Model Overview

This model is a large OneFormer model trained on the COCO dataset, utilizing the Swin backbone network. As a unified image segmentation framework, it surpasses specialized models across multiple segmentation tasks with just one model.

Model Features

Unified Multi-task Architecture

The first framework to achieve semantic/instance/panoptic segmentation with a single model, eliminating the need for specialized models

Dynamic Task Inference

Guides the model to focus on the current task through task tokens, enabling task-oriented training and dynamic task inference

Surpasses Specialized Models

On the COCO dataset, the single model outperforms traditional specialized models in all segmentation tasks

Model Capabilities

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Image Scene Understanding

Object Recognition and Localization

Use Cases

Computer Vision

Autonomous Driving Scene Parsing

Used for semantic segmentation of road scenes, identifying elements such as roads, pedestrians, and vehicles

Generates precise scene segmentation maps

Medical Image Analysis

Performs instance segmentation of organs or lesion areas in medical images

Assists doctors in quantitative analysis

Remote Sensing Image Processing

Conducts panoptic segmentation of satellite/aerial images to identify various land cover types

Supports land cover classification and change detection

🚀 OneFormer

OneFormer is a model trained on the COCO dataset (large-sized version, Swin backbone). It offers a unified solution for universal image segmentation.

🚀 Quick Start

OneFormer is a remarkable model for image segmentation. It was introduced in the paper OneFormer: One Transformer to Rule Universal Image Segmentation by Jain et al. and first released in this repository.

model image

✨ Features

OneFormer is the first multi - task universal image segmentation framework. With a single universal architecture, a single model, and training on a single dataset, it can outperform existing specialized models in semantic, instance, and panoptic segmentation tasks. It uses a task token to guide the model during training and make it dynamic during inference, all with just one model.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for semantic, instance, and panoptic segmentation. Check the model hub to find other fine - tuned versions on different datasets.

How to use

💻 Usage Examples

Basic Usage

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
from PIL import Image
import requests
url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/coco.jpeg"
image = Image.open(requests.get(url, stream=True).raw)

# Loading a single model for all three tasks
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")

# Semantic Segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

# Instance Segmentation
instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt")
instance_outputs = model(**instance_inputs)
# pass through image_processor for postprocessing
predicted_instance_map = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

# Panoptic Segmentation
panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt")
panoptic_outputs = model(**panoptic_inputs)
# pass through image_processor for postprocessing
predicted_semantic_map = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]

For more examples, please refer to the documentation.

Citation

@article{jain2022oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={arXiv}, 
      year={2022}
    }

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	OneFormer model trained on the COCO dataset (large - sized version, Swin backbone)
Training Data	ydshieh/coco_dataset_script

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご