Model Selection

Spatiotemporal Modeling

# Spatiotemporal Modeling

Vivit B 16x2 Kinetics400

ViViT is an extension of the Vision Transformer (ViT) for video processing, particularly suitable for video classification tasks.

Video Processing

ViViT is an extension of the Vision Transformer (ViT) for video processing, primarily used for downstream tasks such as video classification.

Video Processing

VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), which learns video representations by predicting pixel values of masked video patches

Video Processing

Video Classification Cnn Rnn

A hybrid CNN-RNN architecture-based video classification model for action recognition tasks

Video Processing

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase