SegFormer-b2開源語義分割模型 - 免費部署助力珊瑚礁生態系統圖像分割

首頁

Segformer B2 Finetuned Coralscapes 1024 1024

由EPFL-ECEO開發

這是一個基於SegFormer架構的語義分割模型，專門針對珊瑚礁生態系統的圖像分割任務進行了優化，在Coralscapes數據集上微調。

圖像分割

Transformers

開源協議:Apache-2.0 #珊瑚礁語義分割 #高分辨率圖像處理 #生態監測

下載量 139

發布時間 : 3/7/2025

模型概述

該模型主要用於珊瑚礁生態系統的語義分割任務，能夠識別和分割珊瑚礁圖像中的不同類別。基於MiT-B2骨幹網絡，在1024x1024分辨率下針對Coralscapes數據集進行了微調。

模型特點

高分辨率處理能力

支持1024x1024高分辨率圖像輸入，適合珊瑚礁圖像的精細分割

珊瑚礁專用優化

專門針對Coralscapes數據集進行微調，在珊瑚礁分割任務上表現優異

滑動窗口支持

提供滑動窗口分割策略，可處理任意尺寸的輸入圖像

模型能力

珊瑚礁圖像分割

水下場景理解

生態監測

使用案例

生態監測

珊瑚礁健康評估

通過分割珊瑚礁圖像中的不同區域，評估珊瑚礁健康狀況

可識別40種不同類別的珊瑚和海洋生物

海洋生態研究

用於研究珊瑚礁生態系統變化和生物多樣性

提供精確的珊瑚覆蓋率統計數據

環境保護

珊瑚礁保護監測

監測珊瑚礁退化情況，為保護措施提供數據支持

🚀 珊瑚礁語義分割模型

本項目基於SegFormer模型，使用MiT - B2骨幹網絡在Coralscapes數據集上進行微調，可用於珊瑚礁圖像的語義分割，為珊瑚礁生態研究提供了有力的技術支持。

🚀 快速開始

使用此模型對Coralscapes數據集中的圖像進行分割的最簡單方法如下：

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
from datasets import load_dataset

# 從coralscapes數據集中加載圖像或加載您自己的圖像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

inputs = preprocessor(image, return_tensors = "pt")
outputs = model(**inputs)
outputs = preprocessor.post_process_semantic_segmentation(outputs, target_sizes=[(image.size[1], image.size[0])])
label_pred = outputs[0].numpy()

雖然上述方法對於不同大小和比例的圖像仍然有效，但對於與模型訓練大小（1024x1024）相差較大的圖像，我們建議使用以下滑動窗口方法以獲得更好的結果：

import torch 
import torch.nn.functional as F
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import numpy as np
from datasets import load_dataset
device = 'cuda' if torch.cuda.is_available() else 'cpu'

def resize_image(image, target_size=1024):
    """
    用於調整圖像大小，使較小的邊等於1024
    """
    h_img, w_img = image.size
    if h_img < w_img:
        new_h, new_w = target_size, int(w_img * (target_size / h_img))
    else:
        new_h, new_w  = int(h_img * (target_size / w_img)), target_size
    resized_img = image.resize((new_h, new_w))
    return resized_img

def segment_image(image, preprocessor, model, crop_size = (1024, 1024), num_classes = 40, transform=None):
    """
    根據圖像大小和寬高比找到最佳步長，創建大小為1024x1024的重疊滑動窗口，然後將其輸入到模型中。  
    """ 
    h_crop, w_crop = crop_size
    
    img = torch.Tensor(np.array(resize_image(image, target_size=1024)).transpose(2, 0, 1)).unsqueeze(0)
    batch_size, _, h_img, w_img = img.size()
    
    if transform:
        img = torch.Tensor(transform(image = img.numpy())["image"]).to(device)    
        
    h_grids = int(np.round(3/2*h_img/h_crop)) if h_img > h_crop else 1
    w_grids = int(np.round(3/2*w_img/w_crop)) if w_img > w_crop else 1
    
    h_stride = int((h_img - h_crop + h_grids -1)/(h_grids -1)) if h_grids > 1 else h_crop
    w_stride = int((w_img - w_crop + w_grids -1)/(w_grids -1)) if w_grids > 1 else w_crop
    
    preds = img.new_zeros((batch_size, num_classes, h_img, w_img))
    count_mat = img.new_zeros((batch_size, 1, h_img, w_img))
    
    for h_idx in range(h_grids):
        for w_idx in range(w_grids):
            y1 = h_idx * h_stride
            x1 = w_idx * w_stride
            y2 = min(y1 + h_crop, h_img)
            x2 = min(x1 + w_crop, w_img)
            y1 = max(y2 - h_crop, 0)
            x1 = max(x2 - w_crop, 0)
            crop_img = img[:, :, y1:y2, x1:x2]
            with torch.no_grad():
                if(preprocessor):
                    inputs = preprocessor(crop_img, return_tensors = "pt")
                    inputs["pixel_values"] = inputs["pixel_values"].to(device)
                else:
                    inputs = crop_img.to(device)
                outputs = model(**inputs)

            resized_logits = F.interpolate(
                outputs.logits[0].unsqueeze(dim=0), size=crop_img.shape[-2:], mode="bilinear", align_corners=False
            )
            preds += F.pad(resized_logits,
                            (int(x1), int(preds.shape[3] - x2), int(y1),
                            int(preds.shape[2] - y2))).cpu()
            count_mat[:, :, y1:y2, x1:x2] += 1
        
    assert (count_mat == 0).sum() == 0
    preds = preds / count_mat
    preds = preds.argmax(dim=1)
    preds = F.interpolate(preds.unsqueeze(0).type(torch.uint8), size=image.size[::-1], mode='nearest')
    label_pred = preds.squeeze().cpu().numpy()
    return label_pred

# 從coralscapes數據集中加載圖像或加載您自己的圖像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

label_pred = segment_image(image, preprocessor, model)

✨ 主要特性

模型類型：SegFormer，在珊瑚礁圖像語義分割任務上表現出色。
微調基礎：基於預訓練的SegFormer（b2大小）編碼器（nvidia/mit - b2）進行微調。

📦 安裝指南

文檔未提及安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
from datasets import load_dataset

# 從coralscapes數據集中加載圖像或加載您自己的圖像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

inputs = preprocessor(image, return_tensors = "pt")
outputs = model(**inputs)
outputs = preprocessor.post_process_semantic_segmentation(outputs, target_sizes=[(image.size[1], image.size[0])])
label_pred = outputs[0].numpy()

高級用法

import torch 
import torch.nn.functional as F
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import numpy as np
from datasets import load_dataset
device = 'cuda' if torch.cuda.is_available() else 'cpu'

def resize_image(image, target_size=1024):
    """
    用於調整圖像大小，使較小的邊等於1024
    """
    h_img, w_img = image.size
    if h_img < w_img:
        new_h, new_w = target_size, int(w_img * (target_size / h_img))
    else:
        new_h, new_w  = int(h_img * (target_size / w_img)), target_size
    resized_img = image.resize((new_h, new_w))
    return resized_img

def segment_image(image, preprocessor, model, crop_size = (1024, 1024), num_classes = 40, transform=None):
    """
    根據圖像大小和寬高比找到最佳步長，創建大小為1024x1024的重疊滑動窗口，然後將其輸入到模型中。  
    """ 
    h_crop, w_crop = crop_size
    
    img = torch.Tensor(np.array(resize_image(image, target_size=1024)).transpose(2, 0, 1)).unsqueeze(0)
    batch_size, _, h_img, w_img = img.size()
    
    if transform:
        img = torch.Tensor(transform(image = img.numpy())["image"]).to(device)    
        
    h_grids = int(np.round(3/2*h_img/h_crop)) if h_img > h_crop else 1
    w_grids = int(np.round(3/2*w_img/w_crop)) if w_img > w_crop else 1
    
    h_stride = int((h_img - h_crop + h_grids -1)/(h_grids -1)) if h_grids > 1 else h_crop
    w_stride = int((w_img - w_crop + w_grids -1)/(w_grids -1)) if w_grids > 1 else w_crop
    
    preds = img.new_zeros((batch_size, num_classes, h_img, w_img))
    count_mat = img.new_zeros((batch_size, 1, h_img, w_img))
    
    for h_idx in range(h_grids):
        for w_idx in range(w_grids):
            y1 = h_idx * h_stride
            x1 = w_idx * w_stride
            y2 = min(y1 + h_crop, h_img)
            x2 = min(x1 + w_crop, w_img)
            y1 = max(y2 - h_crop, 0)
            x1 = max(x2 - w_crop, 0)
            crop_img = img[:, :, y1:y2, x1:x2]
            with torch.no_grad():
                if(preprocessor):
                    inputs = preprocessor(crop_img, return_tensors = "pt")
                    inputs["pixel_values"] = inputs["pixel_values"].to(device)
                else:
                    inputs = crop_img.to(device)
                outputs = model(**inputs)

            resized_logits = F.interpolate(
                outputs.logits[0].unsqueeze(dim=0), size=crop_img.shape[-2:], mode="bilinear", align_corners=False
            )
            preds += F.pad(resized_logits,
                            (int(x1), int(preds.shape[3] - x2), int(y1),
                            int(preds.shape[2] - y2))).cpu()
            count_mat[:, :, y1:y2, x1:x2] += 1
        
    assert (count_mat == 0).sum() == 0
    preds = preds / count_mat
    preds = preds.argmax(dim=1)
    preds = F.interpolate(preds.unsqueeze(0).type(torch.uint8), size=image.size[::-1], mode='nearest')
    label_pred = preds.squeeze().cpu().numpy()
    return label_pred

# 從coralscapes數據集中加載圖像或加載您自己的圖像 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

label_pred = segment_image(image, preprocessor, model)

📚 詳細文檔

模型詳情

模型描述

屬性	詳情
模型類型	SegFormer
微調基礎模型	[SegFormer（b2大小）僅預訓練編碼器 (`nvidia/mit - b2`)](https://huggingface.co/nvidia/mit - b2)

模型來源

倉庫：[coralscapesScripts](https://github.com/eceo - epfl/coralscapesScripts/)
演示：[Hugging Face Spaces](https://huggingface.co/spaces/EPFL - ECEO/coralscapes_demo)

訓練與評估詳情

數據

模型在[Coralscapes數據集](https://huggingface.co/datasets/EPFL - ECEO/coralscapes)上進行訓練和評估，該數據集是一個用於珊瑚礁的通用密集語義分割數據集。

過程

訓練按照SegFormer原始[實現](https://proceedings.neurips.cc/paper_files/paper/2021/file/64f1f27bf1b4ec22924fd0acb550c235 - Paper.pdf)進行，使用批量大小為8，訓練265個週期。使用AdamW優化器，初始學習率為6e - 5，權重衰減為1e - 2，並使用冪為1的多項式學習率調度器。在訓練期間，圖像在1到2的範圍內隨機縮放，以0.5的概率水平翻轉，並隨機裁剪為1024×1024像素。輸入圖像使用ImageNet的均值和標準差進行歸一化。在評估時，採用非重疊滑動窗口策略，窗口大小為1024x1024。

結果

測試準確率：80.904
測試平均IoU：54.682

🔧 技術細節

文檔未提及足夠的技術實現細節，故跳過此章節。

📄 許可證

本項目採用Apache 2.0許可證。

引用

如果您發現此項目有用，請考慮引用：

@misc{sauder2025coralscapesdatasetsemanticscene,
        title={The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs}, 
        author={Jonathan Sauder and Viktor Domazetoski and Guilhem Banc-Prandi and Gabriela Perna and Anders Meibom and Devis Tuia},
        year={2025},
        eprint={2503.20000},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2503.20000}, 
  }