Model Selection

Long Video Processing

# Long Video Processing

Slowfast Video Mllm Qwen2 7b Convnext 576 Frame64 S1t4

A video multimodal large language model using a slow-fast architecture, balancing temporal resolution and spatial details, supporting 64-frame video understanding

Llava NeXT Video 7B Hf

LLaVA-NeXT-Video-7B-hf is a video-based multimodal model capable of processing video and text inputs to generate text outputs.

Video-to-Text English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase