SegFormer - b2オープンソースセマンティックセグメンテーションモデル - 無料デプロイでサンゴ礁生態系の画像セグメンテーションをサポート

ホーム

Segformer B2 Finetuned Coralscapes 1024 1024

EPFL-ECEOによって開発

これはSegFormerアーキテクチャに基づくセマンティックセグメンテーションモデルで、サンゴ礁生態系の画像セグメンテーションタスクに特化して最適化され、Coralscapesデータセットでファインチューニングされています。

画像セグメンテーション

Transformers

オープンソースライセンス:Apache-2.0 #サンゴ礁セマンティックセグメンテーション #高解像度画像処理 #生態モニタリング

ダウンロード数 139

リリース時間 : 3/7/2025

モデル概要

このモデルは主にサンゴ礁生態系のセマンティックセグメンテーションタスクに使用され、サンゴ礁画像中の異なるカテゴリを識別・分割できます。MiT-B2バックボーンネットワークに基づき、1024x1024解像度でCoralscapesデータセットに対してファインチューニングされています。

モデル特徴

高解像度処理能力

1024x1024の高解像度画像入力をサポートし、サンゴ礁画像の精密なセグメンテーションに適しています

サンゴ礁特化最適化

Coralscapesデータセットに特化してファインチューニングされており、サンゴ礁セグメンテーションタスクで優れた性能を発揮します

スライディングウィンドウサポート

スライディングウィンドウ分割戦略を提供し、任意サイズの入力画像を処理できます

モデル能力

サンゴ礁画像セグメンテーション

水中シーン理解

生態モニタリング

使用事例

生態モニタリング

サンゴ礁健康評価

サンゴ礁画像の異なる領域を分割することで、サンゴ礁の健康状態を評価します

40種類の異なるサンゴと海洋生物を識別可能

海洋生態研究

サンゴ礁生態系の変化と生物多様性を研究するために使用されます

正確なサンゴ被覆率統計データを提供します

環境保護

サンゴ礁保護モニタリング

サンゴ礁の劣化状況を監視し、保護措置のためのデータサポートを提供します

🚀 モデルIDのモデルカード

The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefsで導入されたように、解像度1024x1024のCoralscapesで微調整されたMiT - B2バックボーンを持つSegFormerモデルです。

🚀 クイックスタート

このモデルを使ってCoralscapesデータセットの画像をセグメント化する最も簡単な方法は以下の通りです。

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
from datasets import load_dataset

# Load an image from the coralscapes dataset or load your own image 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

inputs = preprocessor(image, return_tensors = "pt")
outputs = model(**inputs)
outputs = preprocessor.post_process_semantic_segmentation(outputs, target_sizes=[(image.size[1], image.size[0])])
label_pred = outputs[0].numpy()

上記のアプローチは異なるサイズやスケールの画像でも機能しますが、モデルの学習サイズ（1024x1024）に近くない画像の場合は、スライディングウィンドウを使用した以下のアプローチをおすすめします。

import torch 
import torch.nn.functional as F
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import numpy as np
from datasets import load_dataset
device = 'cuda' if torch.cuda.is_available() else 'cpu'

def resize_image(image, target_size=1024):
    """
    Used to resize the image such that the smaller side equals 1024
    """
    h_img, w_img = image.size
    if h_img < w_img:
        new_h, new_w = target_size, int(w_img * (target_size / h_img))
    else:
        new_h, new_w  = int(h_img * (target_size / w_img)), target_size
    resized_img = image.resize((new_h, new_w))
    return resized_img

def segment_image(image, preprocessor, model, crop_size = (1024, 1024), num_classes = 40, transform=None):
    """
    Finds an optimal stride based on the image size and aspect ratio to create
    overlapping sliding windows of size 1024x1024 which are then fed into the model.  
    """ 
    h_crop, w_crop = crop_size
    
    img = torch.Tensor(np.array(resize_image(image, target_size=1024)).transpose(2, 0, 1)).unsqueeze(0)
    batch_size, _, h_img, w_img = img.size()
    
    if transform:
        img = torch.Tensor(transform(image = img.numpy())["image"]).to(device)    
        
    h_grids = int(np.round(3/2*h_img/h_crop)) if h_img > h_crop else 1
    w_grids = int(np.round(3/2*w_img/w_crop)) if w_img > w_crop else 1
    
    h_stride = int((h_img - h_crop + h_grids -1)/(h_grids -1)) if h_grids > 1 else h_crop
    w_stride = int((w_img - w_crop + w_grids -1)/(w_grids -1)) if w_grids > 1 else w_crop
    
    preds = img.new_zeros((batch_size, num_classes, h_img, w_img))
    count_mat = img.new_zeros((batch_size, 1, h_img, w_img))
    
    for h_idx in range(h_grids):
        for w_idx in range(w_grids):
            y1 = h_idx * h_stride
            x1 = w_idx * w_stride
            y2 = min(y1 + h_crop, h_img)
            x2 = min(x1 + w_crop, w_img)
            y1 = max(y2 - h_crop, 0)
            x1 = max(x2 - w_crop, 0)
            crop_img = img[:, :, y1:y2, x1:x2]
            with torch.no_grad():
                if(preprocessor):
                    inputs = preprocessor(crop_img, return_tensors = "pt")
                    inputs["pixel_values"] = inputs["pixel_values"].to(device)
                else:
                    inputs = crop_img.to(device)
                outputs = model(**inputs)

            resized_logits = F.interpolate(
                outputs.logits[0].unsqueeze(dim=0), size=crop_img.shape[-2:], mode="bilinear", align_corners=False
            )
            preds += F.pad(resized_logits,
                            (int(x1), int(preds.shape[3] - x2), int(y1),
                            int(preds.shape[2] - y2))).cpu()
            count_mat[:, :, y1:y2, x1:x2] += 1
        
    assert (count_mat == 0).sum() == 0
    preds = preds / count_mat
    preds = preds.argmax(dim=1)
    preds = F.interpolate(preds.unsqueeze(0).type(torch.uint8), size=image.size[::-1], mode='nearest')
    label_pred = preds.squeeze().cpu().numpy()
    return label_pred

# Load an image from the coralscapes dataset or load your own image 
dataset = load_dataset("EPFL-ECEO/coralscapes") 
image = dataset["test"][42]["image"]

preprocessor = SegformerImageProcessor.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("EPFL-ECEO/segformer-b2-finetuned-coralscapes-1024-1024")

label_pred = segment_image(image, preprocessor, model)

✨ 主な機能

コーラル礁の画像セグメンテーションに特化したSegFormerモデルです。
微調整により、Coralscapesデータセットに対して高い精度を発揮します。

📚 ドキュメント

モデルの詳細

モデルの説明

属性	详情
モデルタイプ	SegFormer
微調整元のモデル	SegFormer (b2サイズ) エンコーダーの事前学習済みモデル (`nvidia/mit-b2`)

モデルのソース

リポジトリ: coralscapesScripts
デモ: Hugging Face Spaces

学習と評価の詳細

データ

このモデルは、コーラル礁の汎用的な密なセマンティックセグメンテーションデータセットであるCoralscapesデータセットで学習および評価されています。

手順

学習は、Segformerの元の実装に従って行われます。バッチサイズ8で265エポック、AdamWオプティマイザーを使用し、初期学習率は6e - 5、重み減衰は1e - 2、多項式学習率スケジューラーのパワーは1です。学習中、画像は1から2の範囲でランダムにスケーリングされ、0.5の確率で水平方向に反転され、ランダムに1024×1024ピクセルにクロップされます。入力画像はImageNetの平均と標準偏差を使用して正規化されます。評価には、サイズ1024x1024のウィンドウを使用した非重複スライディングウィンドウ戦略が採用されています。

結果

テスト精度: 80.904
テスト平均IoU: 54.682

📄 ライセンス

このモデルは、Apache 2.0ライセンスの下で提供されています。

引用

このプロジェクトが役に立った場合は、以下のように引用してください。

@misc{sauder2025coralscapesdatasetsemanticscene,
        title={The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs}, 
        author={Jonathan Sauder and Viktor Domazetoski and Guilhem Banc-Prandi and Gabriela Perna and Anders Meibom and Devis Tuia},
        year={2025},
        eprint={2503.20000},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2503.20000}, 
  }