MegaDescriptor-L-384開源圖像特徵模型 - 助力動物重識別，服務生態學應用

首頁

Megadescriptor L 384

由BVRA開發

基於Swin-L架構的圖像特徵模型，專為動物重識別任務設計，在生態學領域有廣泛應用。

圖像分類

PyTorch

#動物重識別 #Swin-L架構 #生態監測

下載量 5,957

發布時間 : 9/27/2023

模型概述

該模型是一個基於Swin Transformer架構的視覺特徵提取模型，主要用於動物重識別任務。它在多個野生動物數據集上進行了預訓練，能夠生成高質量的圖像嵌入特徵。

模型特點

高性能特徵提取

基於Swin-L架構，能夠提取高質量的圖像特徵表示

動物重識別優化

專門針對動物重識別任務進行了優化和預訓練

大尺寸輸入支持

支持384x384像素的高分辨率圖像輸入

模型能力

圖像特徵提取

動物個體識別

野生動物監測

使用案例

生態保護

野生動物種群監測

用於識別和追蹤特定野生動物個體，監測種群數量和活動範圍

提高野生動物保護工作的效率和準確性

科學研究

動物行為研究

幫助研究人員識別和追蹤特定動物個體，研究其行為模式

為動物行為學研究提供技術支持

🚀 MegaDescriptor-L-384模型卡片

MegaDescriptor-L-384是一個基於Swin-L架構的圖像特徵模型，它在動物重識別數據集上進行了超智能的預訓練，能夠為動物圖像的特徵提取和重識別任務提供強大支持。

🚀 快速開始

MegaDescriptor-L-384是一個Swin-L圖像特徵模型，在動物重識別數據集上進行了預訓練。下面是使用該模型生成圖像嵌入的示例代碼：

import timm
import torch
import torchvision.transforms as T

from PIL import Image
from urllib.request import urlopen

model = timm.create_model("hf-hub:BVRA/MegaDescriptor-L-384", pretrained=True)
model = model.eval()

train_transforms = T.Compose([T.Resize(size=(384, 384)),
                              T.ToTensor(), 
                              T.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]) 

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

output = model(train_transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor
# output is a (1, num_features) shaped tensor

✨ 主要特性

適用領域廣泛：適用於圖像分類、生態學、動物識別和重識別等多個領域。
預訓練優勢：在動物重識別數據集上進行預訓練，能更好地處理動物相關圖像。

📚 詳細文檔

模型詳情

屬性	詳情
模型類型	動物重識別/特徵骨幹網絡
模型參數	228.8M
圖像尺寸	384 x 384
架構	swin_large_patch4_window12_384
論文	WildlifeDatasets_An_Open-Source_Toolkit_for_Animal_Re-Identification
相關論文	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows、DINOv2: Learning Robust Visual Features without Supervision
預訓練數據集	所有可用的重識別數據集 --> https://github.com/WildlifeDatasets/wildlife-datasets

模型使用

圖像嵌入

import timm
import torch
import torchvision.transforms as T

from PIL import Image
from urllib.request import urlopen

model = timm.create_model("hf-hub:BVRA/MegaDescriptor-L-384", pretrained=True)
model = model.eval()

train_transforms = T.Compose([T.Resize(size=(384, 384)),
                              T.ToTensor(), 
                              T.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]) 

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

output = model(train_transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor
# output is a (1, num_features) shaped tensor

引用信息

@inproceedings{vcermak2024wildlifedatasets,
  title={WildlifeDatasets: An open-source toolkit for animal re-identification},
  author={{\v{C}}erm{\'a}k, Vojt{\v{e}}ch and Picek, Lukas and Adam, Luk{\'a}{\v{s}} and Papafitsoros, Kostas},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={5953--5963},
  year={2024}
}