TinyLLaVA-Video Open-Source Video Understanding Model - Free Deployment, Precise Video Frame Processing

Tinyllava Video Qwen2.5 3B Group 16 512

Developed by Zhang199

TinyLLaVA-Video is a video understanding model based on Qwen2.5-3B and siglip-so400m-patch14-384, utilizing a grouped resampler for video frame processing

Video-to-Text

Safetensors

Open Source License:Apache-2.0 #Video Understanding #Multi-frame Processing #Lightweight Model

Downloads 76

Release Time : 3/19/2025

Model Overview

This model combines a large language model with a vision module, specifically designed for video-text-to-text tasks, capable of extracting key frames from videos and performing semantic understanding

Model Features

Efficient Video Processing

Uses a grouped resampler to extract 16 frames from each video segment, improving processing efficiency

Multimodal Understanding

Combines vision and language models for deep understanding of video content

Compact Architecture

Lightweight design with only 3B parameters, reducing computational demands while maintaining performance

Model Capabilities

Video Content Understanding

Video-Text Conversion

Multimodal Reasoning

Temporal Information Processing

Use Cases

Video Analysis

Video Content Summarization

Automatically generates textual summaries of video content

Achieved 42.4 points on LongVideoBench

Video Question Answering

Answers various questions about video content

Achieved 47.0 points on Video-MME

Intelligent Surveillance

Anomaly Behavior Detection

Identifies abnormal events in surveillance videos

Model (HF Path)	#Frame/Query	Video-MME	MVBench	LongVideoBench	MLVU
Zhang199/TinyLLaVA-Video-Qwen2.5-3B-Group-1fps-512	1fps/512	47.7	47.0	42.0	52.6
Zhang199/TinyLLaVA-Video-Qwen2.5-3B-Group-16-512	16/512	47.0	45.5	42.4	52.5
Zhang199/TinyLLaVA-Video-Qwen2.5-3B-Naive-16-512	16/512	44.7	42.5	37.6	48.1
Zhang199/TinyLLaVA-Video-Phi2-Naive-16-512	16/512	42.7	42.0	42.2	46.5

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Tinyllava Video Qwen2.5 3B Group 16 512

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 TinyLLaVA-Video

📚 Documentation

Result

📄 License