DPT-large-ade Open-source Model - Free Deployment to Facilitate Precise and Efficient Semantic Segmentation Tasks

Home

Dpt Large Ade

Developed by Intel

This is a Dense Prediction Transformer (DPT) model fine-tuned on the ADE20k dataset for semantic segmentation tasks.

Image Segmentation

Transformers

Open Source License:Apache-2.0 #High-precision semantic segmentation #Vision Transformer architecture #ADE20k dataset

Downloads 3,497

Release Time : 3/2/2022

Model Overview

The model uses a Vision Transformer (ViT) as the backbone network, enhanced with neck and head structures for semantic segmentation, capable of high-quality semantic segmentation of input images.

Model Features

High-performance semantic segmentation

Achieved 49.02% mIoU on the ADE20K dataset, setting a new state-of-the-art benchmark.

Vision Transformer-based

Utilizes a Vision Transformer (ViT) as the backbone network combined with the Dense Prediction Transformer (DPT) architecture to deliver high-quality segmentation results.

Fine-tunability

The architecture can be fine-tuned on smaller datasets and has achieved new state-of-the-art results on these datasets as well.

Model Capabilities

Image semantic segmentation

High-resolution image processing

Multi-class object recognition

Use Cases

Computer Vision

Scene parsing

Used to parse various objects and backgrounds in complex scenes, suitable for applications like autonomous driving and robot navigation.

Achieved 49.02% mIoU on the ADE20K dataset.

Image editing

Can be used in image editing tools to help users quickly separate different elements in an image.

🚀 DPT (large-sized model) fine-tuned on ADE20k

This model is designed for semantic segmentation of input images, offering high - precision results.

🚀 Quick Start

The model is used for semantic segmentation of input images, as shown in the table below:

Input Image	Output Segmented Image

✨ Features

Powerful Semantic Segmentation: Trained on ADE20k, it can accurately segment various objects in images.
State - of - the - Art Performance: Sets new records on ADE20K and other datasets like NYUv2, KITTI, and Pascal Context.
Flexible Architecture: Based on Vision Transformer (ViT) with a neck + head for semantic segmentation.

📚 Documentation

Model description

The Midas 3.0 based Dense Prediction Transformer (DPT) model was trained on ADE20k for semantic segmentation. It was introduced in the paper Vision Transformers for Dense Prediction by Ranftl et al. and first released in this repository.

The MiDaS v3.0 DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for semantic segmentation.

model image

Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face and the Intel AI Community team.

Results

According to the authors, at the time of publication, when applied to semantic segmentation, dense vision transformers set a new state of the art on

ADE20K with 49.02% mIoU.

We further show that the architecture can be fine - tuned on smaller datasets such as NYUv2, KITTI, and Pascal Context where it also sets the new state of the art. Our models are available at Intel DPT GItHub Repository.

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

Here is how to use this model:

from transformers import DPTFeatureExtractor, DPTForSemanticSegmentation
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000026204.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = DPTImageProcessor .from_pretrained("Intel/dpt-large-ade")
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade")

inputs = feature_extractor(images=image, return_tensors="pt")

outputs = model(**inputs)
logits = outputs.logits
print(logits.shape)
logits
prediction = torch.nn.functional.interpolate(
    logits,
    size=image.size[::-1],  # Reverse the size of the original image (width, height)
    mode="bicubic",
    align_corners=False
)

# Convert logits to class predictions
prediction = torch.argmax(prediction, dim=1) + 1

# Squeeze the prediction tensor to remove dimensions
prediction = prediction.squeeze()

# Move the prediction tensor to the CPU and convert it to a numpy array
prediction = prediction.cpu().numpy()

# Convert the prediction array to an image
predicted_seg = Image.fromarray(prediction.squeeze().astype('uint8'))

# Define the ADE20K palette
adepallete = [0,0,0,120,120,120,180,120,120,6,230,230,80,50,50,4,200,3,120,120,80,140,140,140,204,5,255,230,230,230,4,250,7,224,5,255,235,255,7,150,5,61,120,120,70,8,255,51,255,6,82,143,255,140,204,255,4,255,51,7,204,70,3,0,102,200,61,230,250,255,6,51,11,102,255,255,7,71,255,9,224,9,7,230,220,220,220,255,9,92,112,9,255,8,255,214,7,255,224,255,184,6,10,255,71,255,41,10,7,255,255,224,255,8,102,8,255,255,61,6,255,194,7,255,122,8,0,255,20,255,8,41,255,5,153,6,51,255,235,12,255,160,150,20,0,163,255,140,140,140,250,10,15,20,255,0,31,255,0,255,31,0,255,224,0,153,255,0,0,0,255,255,71,0,0,235,255,0,173,255,31,0,255,11,200,200,255,82,0,0,255,245,0,61,255,0,255,112,0,255,133,255,0,0,255,163,0,255,102,0,194,255,0,0,143,255,51,255,0,0,82,255,0,255,41,0,255,173,10,0,255,173,255,0,0,255,153,255,92,0,255,0,255,255,0,245,255,0,102,255,173,0,255,0,20,255,184,184,0,31,255,0,255,61,0,71,255,255,0,204,0,255,194,0,255,82,0,10,255,0,112,255,51,0,255,0,194,255,0,122,255,0,255,163,255,153,0,0,255,10,255,112,0,143,255,0,82,0,255,163,255,0,255,235,0,8,184,170,133,0,255,0,255,92,184,0,255,255,0,31,0,184,255,0,214,255,255,0,112,92,255,0,0,224,255,112,224,255,70,184,160,163,0,255,153,0,255,71,255,0,255,0,163,255,204,0,255,0,143,0,255,235,133,255,0,255,0,235,245,0,255,255,0,122,255,245,0,10,190,212,214,255,0,0,204,255,20,0,255,255,255,0,0,153,255,0,41,255,0,255,204,41,0,255,41,255,0,173,0,255,0,245,255,71,0,255,122,0,255,0,255,184,0,92,255,184,255,0,0,133,255,255,214,0,25,194,194,102,255,0,92,0,255]

# Apply the color map to the predicted segmentation image
predicted_seg.putpalette(adepallete)

# Blend the original image and the predicted segmentation image
out = Image.blend(image, predicted_seg.convert("RGB"), alpha=0.5)

out

For more code examples, we refer to the documentation.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2103-13413,
  author    = {Ren{\'{e}} Ranftl and
               Alexey Bochkovskiy and
               Vladlen Koltun},
  title     = {Vision Transformers for Dense Prediction},
  journal   = {CoRR},
  volume    = {abs/2103.13413},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.13413},
  eprinttype = {arXiv},
  eprint    = {2103.13413},
  timestamp = {Wed, 07 Apr 2021 15:31:46 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-13413.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご