LightGlue_Superpoint開源模型 - 高效解決計算機視覺特徵匹配與姿態估計難題

首頁

Lightglue Superpoint

由ETH-CVG開發

LightGlue是一個高效的關鍵點檢測和匹配模型，用於計算機視覺中的特徵匹配和姿態估計問題。

姿態估計

Transformers

開源協議:其他 #自適應特徵匹配 #即時圖像對齊 #多視圖幾何

下載量 316

發布時間 : 2/20/2025

模型概述

LightGlue是一個神經網絡模型，能夠匹配圖像中的兩組特徵點，解決計算機視覺中的特徵匹配和姿態估計問題，在多視圖幾何問題上有廣泛應用。

模型特點

高效準確

在內存和計算方面都更加高效，同時具有更高的準確性，比SuperGlue更容易訓練。

自適應計算

根據每對圖像的匹配難度自適應調整計算量，對於容易匹配的圖像對推理速度更快。

即時運行

設計高效，可在現代GPU上即時運行，適合即時應用。

模型能力

圖像特徵點檢測

圖像特徵點匹配

姿態估計

多視圖幾何分析

使用案例

計算機視覺

多視圖幾何匹配

匹配不同視角下的圖像特徵點，用於3D重建和場景理解。

高效準確地匹配特徵點，支持即時應用。

即時SLAM系統

在同步定位與地圖構建系統中進行即時特徵匹配。

快速響應，適合即時處理。

🚀 LightGlue

LightGlue是一個用於關鍵點檢測的模型，它能夠匹配圖像中的兩組特徵點，解決計算機視覺中的特徵匹配和姿態估計問題，在多視圖幾何問題上有廣泛應用。

🚀 快速開始

以下是使用該模型的快速示例。由於此模型是圖像匹配模型，需要成對的圖像進行匹配。原始輸出包含關鍵點檢測器檢測到的關鍵點列表以及匹配對及其對應的匹配分數。

from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests

url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image2 = Image.open(requests.get(url_image2, stream=True).raw)

images = [image1, image2]

processor = AutoImageProcessor.from_pretrained("stevenbucaille/lightglue_superpoint")
model = AutoModel.from_pretrained("stevenbucaille/lightglue_superpoint")

inputs = processor(images, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

你可以使用LightGlueImageProcessor中的post_process_keypoint_matching方法以可讀格式獲取關鍵點和匹配對：

image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
    print("For the image pair", i)
    for keypoint0, keypoint1, matching_score in zip(
            output["keypoints0"], output["keypoints1"], output["matching_scores"]
    ):
        print(
            f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
        )

你可以通過向此方法提供原始圖像和輸出來可視化圖像之間的匹配：

processor.plot_keypoint_matching(images, outputs)

image/png

✨ 主要特性

高效準確：LightGlue模型在內存和計算方面都更加高效，同時具有更高的準確性，並且比長期以來無可匹敵的SuperGlue更容易訓練。
自適應計算：該模型能夠根據每對圖像的匹配難度自適應調整計算量，對於視覺重疊較大或外觀變化有限等容易匹配的圖像對，推理速度更快。
即時運行：設計高效，可在現代GPU上即時運行，適合即時應用。

📦 安裝指南

文檔未提供安裝步驟，故跳過該章節。

💻 使用示例

基礎用法

from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests

url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image2 = Image.open(requests.get(url_image2, stream=True).raw)

images = [image1, image2]

processor = AutoImageProcessor.from_pretrained("stevenbucaille/lightglue_superpoint")
model = AutoModel.from_pretrained("stevenbucaille/lightglue_superpoint")

inputs = processor(images, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

高級用法

# 使用post_process_keypoint_matching方法以可讀格式獲取關鍵點和匹配對
image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
    print("For the image pair", i)
    for keypoint0, keypoint1, matching_score in zip(
            output["keypoints0"], output["keypoints1"], output["matching_scores"]
    ):
        print(
            f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
        )

# 可視化圖像之間的匹配
processor.plot_keypoint_matching(images, outputs)

📚 詳細文檔

模型描述

LightGlue是一個神經網絡，通過聯合查找對應關係並拒絕不可匹配的點來匹配兩組局部特徵。基於SuperGlue的成功，該模型能夠內省自身預測的置信度。它根據每對要匹配的圖像的難度調整計算量，其深度和寬度都是自適應的：

如果所有預測都已完成，推理可以在早期層停止。
被認為不可匹配的點會在後續步驟中儘早被丟棄。最終得到的LightGlue模型比長期以來無可匹敵的SuperGlue更快、更準確，並且更容易訓練。 | 屬性 | 詳情 | |------|------| | 開發團隊 | 蘇黎世聯邦理工學院 - 計算機視覺與幾何實驗室 | | 模型類型 | 圖像匹配 | | 許可證 | 僅限學術或非營利組織非商業研究使用（由於使用SuperPoint作為其關鍵點檢測器而隱含） |

模型來源

倉庫：https://github.com/cvg/LightGlue
論文：http://arxiv.org/abs/2306.13643
演示：https://colab.research.google.com/github/cvg/LightGlue/blob/main/demo.ipynb

使用場景

LightGlue專為計算機視覺中的特徵匹配和姿態估計任務而設計。它可以應用於各種多視圖幾何問題，並且能夠處理具有挑戰性的現實室內和室外環境。然而，它在需要不同類型視覺理解的任務上可能表現不佳，例如目標檢測或圖像分類。

🔧 技術細節

訓練細節

LightGlue在用於姿態估計的大型標註數據集上進行訓練，使其能夠學習姿態估計的先驗知識並對3D場景進行推理。訓練數據由具有真實對應關係的圖像對以及從真實姿態和深度圖導出的未匹配關鍵點組成。 LightGlue遵循SuperGlue的監督訓練設置。它首先使用從100萬張圖像中採樣的合成單應性進行預訓練。這種增強方式提供了完整且無噪聲的監督，但需要仔細調整。然後，LightGlue使用MegaDepth數據集進行微調，該數據集包含100萬張眾包圖像，描繪了196個旅遊地標，其相機校準和姿態通過SfM恢復，密集深度通過多視圖立體視覺恢復。

訓練超參數

訓練機制：fp32

速度、大小、時間

LightGlue設計高效，可在現代GPU上即時運行。對於一對圖像，前向傳播大約需要44毫秒（22 FPS）。該模型有1370萬個參數，與其他一些深度學習模型相比相對緊湊。LightGlue的推理速度適用於即時應用，並且可以輕鬆集成到現代同步定位與地圖構建（SLAM）或運動恢復結構（SfM）系統中。

📄 許可證

該模型的許可證為僅限學術或非營利組織非商業研究使用（由於使用SuperPoint作為其關鍵點檢測器而隱含）。

引用

BibTeX：

@inproceedings{lindenberger2023lightglue,
  author    = {Philipp Lindenberger and
               Paul-Edouard Sarlin and
               Marc Pollefeys},
  title     = {{LightGlue: Local Feature Matching at Light Speed}},
  booktitle = {ICCV},
  year      = {2023}
}