# Long Video Processing
Slowfast Video Mllm Qwen2 7b Convnext 576 Frame64 S1t4
A video multimodal large language model using a slow-fast architecture, balancing temporal resolution and spatial details, supporting 64-frame video understanding
Video-to-Text
Transformers

S
shi-labs
184
0
Llava NeXT Video 7B Hf
LLaVA-NeXT-Video-7B-hf is a video-based multimodal model capable of processing video and text inputs to generate text outputs.
Video-to-Text English
L
FriendliAI
30
0
Featured Recommended AI Models