TEMPURA Qwen2.5 VL 3B S1
TEMPURA is a video temporal understanding framework combining causal reasoning with fine-grained temporal segmentation, enhancing video event comprehension through two-stage training
Downloads 16
Release Time : 5/4/2025
Model Overview
This model achieves temporal understanding and causal reasoning of video events through masked event prediction and video segmentation techniques, supporting video-to-text generation tasks
Model Features
Two-stage Training Paradigm
Stage one reconstructs missing events through masked event prediction, stage two learns video segmentation and dense description techniques
Temporal Understanding Capability
Deconstructs videos into non-overlapping events and generates timestamp-aligned detailed descriptions
Large-scale Training Data
Trained on VER dataset (containing 1 million training instances, 500k videos)
Model Capabilities
Video temporal understanding
Event causal reasoning
Video-to-text generation
Timestamp-aligned description generation
Use Cases
Video Analysis
Video Event Reasoning
Analyzing causal relationships and temporal sequences of events in videos
Outperforms existing strong baseline models
Temporal Localization
Accurately locating specific event timestamps in videos
Demonstrates excellent performance in benchmark tests
Featured Recommended AI Models