SegFormer-B3-1024x1024-City-160k Open-Source Semantic Segmentation Model - Optimally Adapted to the Cityscapes Dataset

Home

Segformer B3 1024x1024 City 160k

Developed by smp-hub

A semantic segmentation model based on the Segformer architecture, optimized for the Cityscapes dataset

Image Segmentation

Safetensors

Open Source License:Other #Urban Scene Segmentation #High-Resolution Image Processing #Semantic Segmentation

Downloads 14

Release Time : 11/29/2024

Model Overview

This model is a semantic segmentation model based on the Transformer architecture, specifically designed for pixel-level classification tasks in urban street scene images. It adopts Segformer's hybrid design, combining the advantages of convolution and Transformer.

Model Features

Hybrid Architecture Design

Combines the advantages of CNN and Transformer, achieving global context while maintaining computational efficiency

High-Resolution Processing

Supports 1024x1024 resolution input, suitable for high-precision segmentation tasks

Pre-trained Weights

Provides weights pre-trained on the Cityscapes dataset, ready for direct inference

Model Capabilities

Street scene image segmentation

Pixel-level classification

High-resolution image processing

Use Cases

Autonomous Driving

Road Scene Understanding

Identifies key elements such as roads, pedestrians, and vehicles

Performs well on the Cityscapes dataset

Urban Management

Infrastructure Analysis

Identifies and classifies urban infrastructure such as roads and buildings

🚀 Segformer Model

Segformer is a powerful model for image segmentation tasks, offering high - performance and flexibility.

🚀 Quick Start

Load trained model

You can load the trained model by following these steps:

Click the button below to open the Colab notebook.
Install the necessary requirements.

pip install -U segmentation_models_pytorch albumentations

Run the inference code.

import torch
import requests
import numpy as np
import albumentations as A
import segmentation_models_pytorch as smp

from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load pretrained model and preprocessing function
checkpoint = "smp-hub/segformer-b3-1024x1024-city-160k"
model = smp.from_pretrained(checkpoint).eval().to(device)
preprocessing = A.Compose.from_pretrained(checkpoint)

# Load image
url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000001.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# Preprocess image
np_image = np.array(image)
normalized_image = preprocessing(image=np_image)["image"]
input_tensor = torch.as_tensor(normalized_image)
input_tensor = input_tensor.permute(2, 0, 1).unsqueeze(0)  # HWC -> BCHW
input_tensor = input_tensor.to(device)

# Perform inference
with torch.no_grad():
    output_mask = model(input_tensor)

# Postprocess mask
mask = torch.nn.functional.interpolate(
    output_mask, size=(image.height, image.width), mode="bilinear", align_corners=False
)
mask = mask.argmax(1).cpu().numpy()  # argmax over predicted classes (channels dim)

✨ Features

Model init parameters

The following are the initialization parameters for the model:

model_init_params = {
    "encoder_name": "mit_b3",
    "encoder_depth": 5,
    "encoder_weights": None,
    "decoder_segmentation_channels": 768,
    "in_channels": 3,
    "classes": 19,
    "activation": None,
    "aux_params": None
}

Dataset

The model is trained on the Cityscapes dataset.

📚 Documentation

More Information

Library: https://github.com/qubvel/segmentation_models.pytorch
Docs: https://smp.readthedocs.io/en/latest/
License: https://github.com/NVlabs/SegFormer/blob/master/LICENSE

This model has been pushed to the Hub using the PytorchModelHubMixin

📄 License

The license information can be found at: https://github.com/NVlabs/SegFormer/blob/master/LICENSE

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご