🚀 V-JEPA 2
V-JEPA 2是由Meta旗下的FAIR團隊開發的前沿視頻理解模型。它擴展了VJEPA的預訓練目標,藉助大規模的數據和模型,實現了業界領先的視頻理解能力。代碼已在此倉庫發佈。
🚀 快速開始
V-JEPA 2是一個強大的視頻理解模型,可用於視頻分類、檢索等任務,也能作為視覺語言模型(VLM)的視頻編碼器。
✨ 主要特性
- 擴展了VJEPA的預訓練目標,具備先進的視頻理解能力。
- 可處理視頻和圖像數據。
- 支持視頻分類、檢索等任務,還能作為VLM的視頻編碼器。
📦 安裝指南
要運行V-JEPA 2模型,需確保安裝了最新版本的transformers
庫:
pip install -U git+https://github.com/huggingface/transformers
💻 使用示例
基礎用法
加載模型和處理器
from transformers import AutoVideoProcessor, AutoModel
hf_repo = "facebook/vjepa2-vitl-fpc64-256"
model = AutoModel.from_pretrained(hf_repo)
processor = AutoVideoProcessor.from_pretrained(hf_repo)
加載視頻
import torch
from torchcodec.decoders import VideoDecoder
import numpy as np
video_url = "https://huggingface.co/datasets/nateraw/kinetics-mini/resolve/main/val/archery/-Qz25rXdMjE_000014_000024.mp4"
vr = VideoDecoder(video_url)
frame_idx = np.arange(0, 64)
video = vr.get_frames_at(indices=frame_idx).data
video = processor(video, return_tensors="pt").to(model.device)
with torch.no_grad():
video_embeddings = model.get_vision_features(**video)
print(video_embeddings.shape)
加載圖像
from transformers.image_utils import load_image
image = load_image("https://huggingface.co/datasets/merve/coco/resolve/main/val2017/000000000285.jpg")
pixel_values = processor(image, return_tensors="pt").to(model.device)["pixel_values_videos"]
pixel_values = pixel_values.repeat(1, 16, 1, 1, 1)
with torch.no_grad():
image_embeddings = model.get_vision_features(pixel_values)
print(image_embeddings.shape)
更多代碼示例,請參考V-JEPA 2文檔。
📄 許可證
本項目採用MIT許可證。
📚 引用
@techreport{assran2025vjepa2,
title={V-JEPA~2: Self-Supervised Video Models Enable Understanding, Prediction and Planning},
author={Assran, Mahmoud and Bardes, Adrien and Fan, David and Garrido, Quentin and Howes, Russell and
Komeili, Mojtaba and Muckley, Matthew and Rizvi, Ammar and Roberts, Claire and Sinha, Koustuv and Zholus, Artem and
Arnaud, Sergio and Gejji, Abha and Martin, Ada and Robert Hogan, Francois and Dugas, Daniel and
Bojanowski, Piotr and Khalidov, Vasil and Labatut, Patrick and Massa, Francisco and Szafraniec, Marc and
Krishnakumar, Kapil and Li, Yong and Ma, Xiaodong and Chandar, Sarath and Meier, Franziska and LeCun, Yann and
Rabbat, Michael and Ballas, Nicolas},
institution={FAIR at Meta},
year={2025}
}