MobileViTv2-1.0-VOC-DeepLabV3 Open-source Semantic Segmentation Model - Supports Precise Processing of 512x512 Images

Mobilevitv2 1.0 Voc Deeplabv3

Developed by shehan97

A semantic segmentation model based on the MobileViTv2 architecture, pre-trained on the PASCAL VOC dataset, supporting 512x512 resolution image processing

Image Segmentation

Transformers

Open Source License:Other #Mobile Image Segmentation #Lightweight Transformer #Semantic Segmentation

Downloads 1,075

Release Time : 5/2/2023

Model Overview

This model combines the efficient architecture of MobileViTv2 with the DeepLabv3 segmentation head, specifically designed for semantic segmentation tasks, suitable for deployment on mobile and edge devices

Model Features

Efficient Mobile Architecture

Utilizes the MobileViTv2 architecture, optimized for computational efficiency on mobile devices

High-Resolution Support

Supports image input at 512x512 resolution

Lightweight Segmentation Head

Integrates the DeepLabv3 segmentation head, reducing computational load while maintaining accuracy

Model Capabilities

Image Semantic Segmentation

Pixel-Level Classification

Mobile Vision Processing

Use Cases

Computer Vision

Autonomous Driving Scene Understanding

Used for identifying and segmenting objects and regions in road scenes

Mobile Image Editing

Supports real-time image segmentation and background replacement on mobile devices

🚀 MobileViTv2 + DeepLabv3 (shehan97/mobilevitv2-1.0-voc-deeplabv3)

A MobileViTv2 model pre - trained on PASCAL VOC at 512x512 resolution for semantic segmentation.

🚀 Quick Start

This is a MobileViTv2 model pre - trained on PASCAL VOC at a resolution of 512x512. It was introduced in Separable Self - attention for Mobile Vision Transformers by Sachin Mehta and Mohammad Rastegari, and first released in this repository under the Apple sample code license.

Disclaimer: The team releasing MobileViT did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Model Structure: MobileViTv2 is constructed by replacing the multi - headed self - attention in MobileViT with separable self - attention.
Semantic Segmentation: The model in this repo adds a DeepLabV3 head to the MobileViT backbone for semantic segmentation.

💻 Usage Examples

Basic Usage

from transformers import MobileViTv2FeatureExtractor, MobileViTv2ForSemanticSegmentation
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = MobileViTv2FeatureExtractor.from_pretrained("shehan97/mobilevitv2-1.0-voc-deeplabv3")
model = MobileViTv2ForSemanticSegmentation.from_pretrained("shehan97/mobilevitv2-1.0-voc-deeplabv3")

inputs = feature_extractor(images=image, return_tensors="pt")

outputs = model(**inputs)
logits = outputs.logits

predicted_mask = logits.argmax(1).squeeze(0)

Currently, both the feature extractor and model support PyTorch.

📚 Documentation

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

🔧 Technical Details

The MobileViT + DeepLabV3 model was pretrained on ImageNet - 1k, a dataset consisting of 1 million images and 1,000 classes, and then fine - tuned on the PASCAL VOC2012 dataset.

📄 License

The license used is Apple sample code license.

BibTeX entry and citation info

@inproceedings{vision-transformer,
title = {Separable Self-attention for Mobile Vision Transformers},
author = {Sachin Mehta and Mohammad Rastegari},
year = {2022},
URL = {https://arxiv.org/abs/2206.02680}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご