LightGlue_Superpoint开源模型 - 高效解决计算机视觉特征匹配与姿态估计难题

首页

Lightglue Superpoint

由 ETH-CVG 开发

LightGlue是一个高效的关键点检测和匹配模型，用于计算机视觉中的特征匹配和姿态估计问题。

姿态估计

Transformers

开源协议:其他 #自适应特征匹配 #实时图像对齐 #多视图几何

下载量 316

发布时间 : 2/20/2025

模型简介

LightGlue是一个神经网络模型，能够匹配图像中的两组特征点，解决计算机视觉中的特征匹配和姿态估计问题，在多视图几何问题上有广泛应用。

模型特点

高效准确

在内存和计算方面都更加高效，同时具有更高的准确性，比SuperGlue更容易训练。

自适应计算

根据每对图像的匹配难度自适应调整计算量，对于容易匹配的图像对推理速度更快。

实时运行

设计高效，可在现代GPU上实时运行，适合实时应用。

模型能力

图像特征点检测

图像特征点匹配

姿态估计

多视图几何分析

使用案例

计算机视觉

多视图几何匹配

匹配不同视角下的图像特征点，用于3D重建和场景理解。

高效准确地匹配特征点，支持实时应用。

实时SLAM系统

在同步定位与地图构建系统中进行实时特征匹配。

快速响应，适合实时处理。

🚀 LightGlue

LightGlue是一个用于关键点检测的模型，它能够匹配图像中的两组特征点，解决计算机视觉中的特征匹配和姿态估计问题，在多视图几何问题上有广泛应用。

🚀 快速开始

以下是使用该模型的快速示例。由于此模型是图像匹配模型，需要成对的图像进行匹配。原始输出包含关键点检测器检测到的关键点列表以及匹配对及其对应的匹配分数。

from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests

url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image2 = Image.open(requests.get(url_image2, stream=True).raw)

images = [image1, image2]

processor = AutoImageProcessor.from_pretrained("stevenbucaille/lightglue_superpoint")
model = AutoModel.from_pretrained("stevenbucaille/lightglue_superpoint")

inputs = processor(images, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

你可以使用LightGlueImageProcessor中的post_process_keypoint_matching方法以可读格式获取关键点和匹配对：

image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
    print("For the image pair", i)
    for keypoint0, keypoint1, matching_score in zip(
            output["keypoints0"], output["keypoints1"], output["matching_scores"]
    ):
        print(
            f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
        )

你可以通过向此方法提供原始图像和输出来可视化图像之间的匹配：

processor.plot_keypoint_matching(images, outputs)

image/png

✨ 主要特性

高效准确：LightGlue模型在内存和计算方面都更加高效，同时具有更高的准确性，并且比长期以来无可匹敌的SuperGlue更容易训练。
自适应计算：该模型能够根据每对图像的匹配难度自适应调整计算量，对于视觉重叠较大或外观变化有限等容易匹配的图像对，推理速度更快。
实时运行：设计高效，可在现代GPU上实时运行，适合实时应用。

📦 安装指南

文档未提供安装步骤，故跳过该章节。

💻 使用示例

基础用法

from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests

url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image2 = Image.open(requests.get(url_image2, stream=True).raw)

images = [image1, image2]

processor = AutoImageProcessor.from_pretrained("stevenbucaille/lightglue_superpoint")
model = AutoModel.from_pretrained("stevenbucaille/lightglue_superpoint")

inputs = processor(images, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

高级用法

# 使用post_process_keypoint_matching方法以可读格式获取关键点和匹配对
image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
    print("For the image pair", i)
    for keypoint0, keypoint1, matching_score in zip(
            output["keypoints0"], output["keypoints1"], output["matching_scores"]
    ):
        print(
            f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
        )

# 可视化图像之间的匹配
processor.plot_keypoint_matching(images, outputs)

📚 详细文档

模型描述

LightGlue是一个神经网络，通过联合查找对应关系并拒绝不可匹配的点来匹配两组局部特征。基于SuperGlue的成功，该模型能够内省自身预测的置信度。它根据每对要匹配的图像的难度调整计算量，其深度和宽度都是自适应的：

如果所有预测都已完成，推理可以在早期层停止。
被认为不可匹配的点会在后续步骤中尽早被丢弃。最终得到的LightGlue模型比长期以来无可匹敌的SuperGlue更快、更准确，并且更容易训练。 | 属性 | 详情 | |------|------| | 开发团队 | 苏黎世联邦理工学院 - 计算机视觉与几何实验室 | | 模型类型 | 图像匹配 | | 许可证 | 仅限学术或非营利组织非商业研究使用（由于使用SuperPoint作为其关键点检测器而隐含） |

模型来源

仓库：https://github.com/cvg/LightGlue
论文：http://arxiv.org/abs/2306.13643
演示：https://colab.research.google.com/github/cvg/LightGlue/blob/main/demo.ipynb

使用场景

LightGlue专为计算机视觉中的特征匹配和姿态估计任务而设计。它可以应用于各种多视图几何问题，并且能够处理具有挑战性的现实室内和室外环境。然而，它在需要不同类型视觉理解的任务上可能表现不佳，例如目标检测或图像分类。

🔧 技术细节

训练细节

LightGlue在用于姿态估计的大型标注数据集上进行训练，使其能够学习姿态估计的先验知识并对3D场景进行推理。训练数据由具有真实对应关系的图像对以及从真实姿态和深度图导出的未匹配关键点组成。 LightGlue遵循SuperGlue的监督训练设置。它首先使用从100万张图像中采样的合成单应性进行预训练。这种增强方式提供了完整且无噪声的监督，但需要仔细调整。然后，LightGlue使用MegaDepth数据集进行微调，该数据集包含100万张众包图像，描绘了196个旅游地标，其相机校准和姿态通过SfM恢复，密集深度通过多视图立体视觉恢复。

训练超参数

训练机制：fp32

速度、大小、时间

LightGlue设计高效，可在现代GPU上实时运行。对于一对图像，前向传播大约需要44毫秒（22 FPS）。该模型有1370万个参数，与其他一些深度学习模型相比相对紧凑。LightGlue的推理速度适用于实时应用，并且可以轻松集成到现代同步定位与地图构建（SLAM）或运动恢复结构（SfM）系统中。

📄 许可证

该模型的许可证为仅限学术或非营利组织非商业研究使用（由于使用SuperPoint作为其关键点检测器而隐含）。

引用

BibTeX：

@inproceedings{lindenberger2023lightglue,
  author    = {Philipp Lindenberger and
               Paul-Edouard Sarlin and
               Marc Pollefeys},
  title     = {{LightGlue: Local Feature Matching at Light Speed}},
  booktitle = {ICCV},
  year      = {2023}
}