🚀 多標籤地理場景網絡
多標籤地理場景網絡(Multilabel-GeoSceneNet)是一個基於視覺語言的編碼器模型,它從 google/siglip2-base-patch16-224 微調而來,用於進行多標籤圖像分類。該模型旨在使用 SiglipForImageClassification 架構識別並標記單張圖像中的多個地理或環境元素。

🚀 快速開始
安裝依賴
!pip install -q transformers torch pillow gradio
推理代碼
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
model_name = "prithivMLmods/Multilabel-GeoSceneNet"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def classify_geoscene_image(image):
"""Predicts geographic scene labels for an input image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.sigmoid(logits).squeeze().tolist()
labels = {
"0": "Buildings and Structures",
"1": "Desert",
"2": "Forest Area",
"3": "Hill or Mountain",
"4": "Ice Glacier",
"5": "Sea or Ocean",
"6": "Street View"
}
threshold = 0.5
predictions = {
labels[str(i)]: round(probs[i], 3)
for i in range(len(probs)) if probs[i] >= threshold
}
return predictions or {"None Detected": 0.0}
iface = gr.Interface(
fn=classify_geoscene_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Predicted Scene Categories"),
title="Multilabel-GeoSceneNet",
description="Upload an image to detect multiple geographic scene elements (e.g., forest, ocean, buildings)."
)
if __name__ == "__main__":
iface.launch()
✨ 主要特性
- 多標籤識別:能夠識別單張圖像中的多個地理或環境元素。
- 高精度:在多個地理場景類別上表現出較高的準確率。
💻 使用示例
基礎用法
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
model_name = "prithivMLmods/Multilabel-GeoSceneNet"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def classify_geoscene_image(image):
"""Predicts geographic scene labels for an input image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.sigmoid(logits).squeeze().tolist()
labels = {
"0": "Buildings and Structures",
"1": "Desert",
"2": "Forest Area",
"3": "Hill or Mountain",
"4": "Ice Glacier",
"5": "Sea or Ocean",
"6": "Street View"
}
threshold = 0.5
predictions = {
labels[str(i)]: round(probs[i], 3)
for i in range(len(probs)) if probs[i] >= threshold
}
return predictions or {"None Detected": 0.0}
iface = gr.Interface(
fn=classify_geoscene_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Predicted Scene Categories"),
title="Multilabel-GeoSceneNet",
description="Upload an image to detect multiple geographic scene elements (e.g., forest, ocean, buildings)."
)
if __name__ == "__main__":
iface.launch()
📚 詳細文檔
分類報告
Classification Report:
precision recall f1-score support
Buildings and Structures 0.8881 0.9498 0.9179 2190
Desert 0.9649 0.9480 0.9564 2000
Forest Area 0.9807 0.9855 0.9831 2271
Hill or Mountain 0.8616 0.8993 0.8800 2512
Ice Glacier 0.9114 0.8382 0.8732 2404
Sea or Ocean 0.9328 0.9525 0.9426 2274
Street View 0.9476 0.9106 0.9287 2382
accuracy 0.9245 16033
macro avg 0.9267 0.9263 0.9260 16033
weighted avg 0.9253 0.9245 0.9244 16033
預期用途
多標籤地理場景網絡(Multilabel-GeoSceneNet)模型適用於識別單張圖像中的多個地理和結構元素。應用場景包括:
- 遙感:標記衛星或無人機圖像中的元素。
- 地理標籤:自動為圖像添加標籤,以便進行搜索或排序。
- 環境監測:識別冰川或森林等特徵。
- 場景理解:幫助自主系統解釋複雜場景。
📄 許可證
本項目採用 Apache-2.0 許可證。
信息表格
屬性 |
詳情 |
模型類型 |
圖像分類 |
基礎模型 |
google/siglip2-base-patch16-224 |
數據集 |
prithivMLmods/Multilabel-GeoSceneNet-16K |
庫名稱 |
transformers |
標籤 |
Structures、Desert、Glacier、Street、Ocean、Image-Classifier、art、Mountain |
語言 |
en |
管道標籤 |
image-classification |