VideoMAEv2-Huge开源视频特征提取模型 - 高效精准提取视频关键特征

首页

Videomaev2 Huge

由 OpenGVLab 开发

VideoMAEv2-Huge是一个基于自监督学习的视频特征提取模型，在UnlabeldHybrid-1M数据集上进行了1200轮预训练。

视频处理

Safetensors

#视频自监督学习 #大规模预训练 #双掩码策略

下载量 1,145

发布时间 : 1/14/2025

模型简介

该模型主要用于视频特征提取，采用双掩码策略进行预训练，能够有效捕捉视频中的时空特征。

模型特点

双掩码预训练策略

采用双掩码策略进行自监督学习，提高模型对视频时空特征的理解能力

大规模预训练

在UnlabeldHybrid-1M数据集上进行了1200轮预训练，学习到丰富的视频特征表示

高效特征提取

能够从视频中提取有意义的时空特征，适用于下游视频理解任务

模型能力

视频特征提取

视频分类

视频理解

使用案例

视频分析

视频内容分类

对视频内容进行分类，如动作识别、场景识别等

视频检索

提取视频特征用于相似视频检索

🚀 VideoMAE-v2（超大模型，在UnlabeledHybrid-1M上预训练）

VideoMAE-v2 超大模型在 UnlabeledHybrid-1M 数据集上以自监督方式预训练了 1200 个轮次。该模型由 Wang 等人在论文 [CVPR23]VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking 中提出，并首次在 GitHub 上发布。

✨ 主要特性

可用于视频特征提取。

🚀 快速开始

预期用途和限制

你可以使用该原始模型进行视频特征提取。

使用方法

以下是如何使用此模型提取视频特征的示例：

from transformers import VideoMAEImageProcessor, AutoModel, AutoConfig
import numpy as np
import torch

config = AutoConfig.from_pretrained("OpenGVLab/VideoMAEv2-Huge", trust_remote_code=True)
processor = VideoMAEImageProcessor.from_pretrained("OpenGVLab/VideoMAEv2-Huge")
model = AutoModel.from_pretrained('OpenGVLab/VideoMAEv2-Huge', config=config, trust_remote_code=True)

video = list(np.random.rand(16, 3, 224, 224))

# B, T, C, H, W -> B, C, T, H, W
inputs = processor(video, return_tensors="pt")
inputs['pixel_values'] = inputs['pixel_values'].permute(0, 2, 1, 3, 4)

with torch.no_grad():
  outputs = model(**inputs)

📚 详细文档

BibTeX 引用和引用信息

@InProceedings{wang2023videomaev2,
    author    = {Wang, Limin and Huang, Bingkun and Zhao, Zhiyu and Tong, Zhan and He, Yinan and Wang, Yi and Wang, Yali and Qiao, Yu},
    title     = {VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {14549-14560}
}

@misc{videomaev2,
      title={VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking},
      author={Limin Wang and Bingkun Huang and Zhiyu Zhao and Zhan Tong and Yinan He and Yi Wang and Yali Wang and Yu Qiao},
      year={2023},
      eprint={2303.16727},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}