オープンソース画像インスタンスセグメンテーションモデル：Mask2Formerが画像内の異なるオブジェクトインスタンスを正確に識別して分割する

ホーム

Finetune Instance Segmentation Ade20k Mini Mask2former No Trainer

qubvel-hfによって開発

これはADE20K-miniデータセットでファインチューニングされたMask2Formerインスタンスセグメンテーションモデルで、画像内の異なるオブジェクトインスタンスを識別・分割できます。

画像セグメンテーション

Transformers

#インスタンスセグメンテーション #小サイズ画像処理 #ADE20Kデータセット

ダウンロード数 24

リリース時間 : 5/26/2024

モデル概要

このモデルはFacebookのMask2Formerアーキテクチャに基づいており、インスタンスセグメンテーションタスク専用で、画像内の異なるオブジェクトインスタンスを識別・分割できます。

モデル特徴

効率的なインスタンスセグメンテーション

画像内の複数のオブジェクトインスタンスを正確に識別・分割可能

Transformerアーキテクチャベース

Swin TransformerとMask2Formerアーキテクチャを採用し、強力な特徴抽出能力を有する

小サイズ入力対応

256x256ピクセルの入力サイズをサポートし、リソースが限られた環境に適している

モデル能力

画像セグメンテーション

オブジェクトインスタンス識別

ピクセルレベルアノテーション

使用事例

コンピュータビジョン

シーン理解

複雑なシーン内の各オブジェクトとその位置関係を分析

各オブジェクトの正確な境界とクラス情報を出力可能

自動運転

道路シーン内の車両、歩行者などの重要なオブジェクトを識別

自動運転システムに正確な環境認識を提供

🚀 インスタンスセグメンテーションの例

このプロジェクトは画像セグメンテーションを行うためのもので、特定のモデルを使用してインスタンスセグメンテーションを実行し、その学習と推論を行う方法を提供します。

🚀 クイックスタート

✨ 主な機能

このモデルは、run_instance_segmentation_no_trainer.py スクリプトに基づいています。
🤗 Accelerate を使用することで、CPU、マルチCPU、GPU、マルチGPU、TPUなどの様々な環境でトレーニングループを実行でき、混合精度もサポートしています。

📦 インストール

Accelerateを使用したPyTorchバージョン

まず、環境を設定します。

accelerate config

トレーニング環境に関する質問に答えてください。次に、以下のコマンドを実行して、トレーニングの準備が整っていることを確認します。

accelerate test

最後に、トレーニングを開始します。

accelerate launch run_instance_segmentation_no_trainer.py \
    --model_name_or_path facebook/mask2former-swin-tiny-coco-instance \
    --output_dir finetune-instance-segmentation-ade20k-mini-mask2former-no-trainer \
    --dataset_name qubvel-hf/ade20k-mini \
    --do_reduce_labels \
    --image_height 256 \
    --image_width 256 \
    --num_train_epochs 40 \
    --learning_rate 1e-5 \
    --lr_scheduler_type constant \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 2 \
    --dataloader_num_workers 8 \
    --push_to_hub

💻 使用例

再読み込みと推論の実行

学習済みモデルを簡単に読み込んで推論を実行することができます。

import torch
import requests
import matplotlib.pyplot as plt

from PIL import Image
from transformers import Mask2FormerForUniversalSegmentation, Mask2FormerImageProcessor

# 画像の読み込み
image = Image.open(requests.get("http://farm4.staticflickr.com/3017/3071497290_31f0393363_z.jpg", stream=True).raw)

# モデルと画像プロセッサの読み込み
device = "cuda"
checkpoint = "qubvel-hf/finetune-instance-segmentation-ade20k-mini-mask2former-no-trainer"

model = Mask2FormerForUniversalSegmentation.from_pretrained(checkpoint, device_map=device)
image_processor = Mask2FormerImageProcessor.from_pretrained(checkpoint)

# 画像に対して推論を実行
inputs = image_processor(images=[image], return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model(**inputs)

# 出力の後処理
outputs = image_processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])

print("Mask shape: ", outputs[0]["segmentation"].shape)
print("Mask values: ", outputs[0]["segmentation"].unique())
for segment in outputs[0]["segments_info"]:
    print("Segment: ", segment)

Mask shape:  torch.Size([427, 640])
Mask values:  tensor([-1.,  0.,  1.,  2.,  3.,  4.,  5.,  6.])
Segment:  {'id': 0, 'label_id': 0, 'was_fused': False, 'score': 0.946127}
Segment:  {'id': 1, 'label_id': 1, 'was_fused': False, 'score': 0.961582}
Segment:  {'id': 2, 'label_id': 1, 'was_fused': False, 'score': 0.968367}
Segment:  {'id': 3, 'label_id': 1, 'was_fused': False, 'score': 0.819527}
Segment:  {'id': 4, 'label_id': 1, 'was_fused': False, 'score': 0.655761}
Segment:  {'id': 5, 'label_id': 1, 'was_fused': False, 'score': 0.531299}
Segment:  {'id': 6, 'label_id': 1, 'was_fused': False, 'score': 0.929477}

以下のコードを使用して結果を可視化することができます。

import numpy as np
import matplotlib.pyplot as plt

segmentation = outputs[0]["segmentation"].numpy()

plt.figure(figsize=(10, 10))
plt.subplot(1, 2, 1)
plt.imshow(np.array(image))
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(segmentation)
plt.axis("off")
plt.show()

Result

📄 ライセンス

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.