T

TEMPURA Qwen2.5 VL 3B S1

Developed by andaba
TEMPURA is a video temporal understanding framework combining causal reasoning with fine-grained temporal segmentation, enhancing video event comprehension through two-stage training
Downloads 16
Release Time : 5/4/2025

Model Overview

This model achieves temporal understanding and causal reasoning of video events through masked event prediction and video segmentation techniques, supporting video-to-text generation tasks

Model Features

Two-stage Training Paradigm
Stage one reconstructs missing events through masked event prediction, stage two learns video segmentation and dense description techniques
Temporal Understanding Capability
Deconstructs videos into non-overlapping events and generates timestamp-aligned detailed descriptions
Large-scale Training Data
Trained on VER dataset (containing 1 million training instances, 500k videos)

Model Capabilities

Video temporal understanding
Event causal reasoning
Video-to-text generation
Timestamp-aligned description generation

Use Cases

Video Analysis
Video Event Reasoning
Analyzing causal relationships and temporal sequences of events in videos
Outperforms existing strong baseline models
Temporal Localization
Accurately locating specific event timestamps in videos
Demonstrates excellent performance in benchmark tests
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase