Athit Timesformer 32PS
TimeSformer is a video understanding model based on spatial-temporal attention mechanism, fine-tuned on the Kinetics-400 dataset, suitable for video classification tasks.
Downloads 17
Release Time : 2/23/2024
Model Overview
This model is primarily used to classify videos into one of 400 possible Kinetics-400 labels, employing a pure attention mechanism to process spatiotemporal information in videos.
Model Features
Pure Attention Mechanism
Completely based on attention mechanism to process spatiotemporal information in videos, without convolution operations
Efficient Video Understanding
Effectively captures spatiotemporal features in videos for accurate video classification
Pre-trained Model
Pre-trained and fine-tuned on the large-scale video dataset Kinetics-400
Model Capabilities
Video Classification
Spatial-Temporal Feature Extraction
Video Content Understanding
Use Cases
Video Analysis
Action Recognition
Identify human actions and behaviors in videos
Can classify 400 different action categories
Video Content Classification
Automatically classify and tag video content
Featured Recommended AI Models
Š 2025AIbase