T

Tinyllava Video Qwen2.5 3B Group 16 512

Developed by Zhang199
TinyLLaVA-Video is a video understanding model based on Qwen2.5-3B and siglip-so400m-patch14-384, utilizing a grouped resampler for video frame processing
Downloads 76
Release Time : 3/19/2025

Model Overview

This model combines a large language model with a vision module, specifically designed for video-text-to-text tasks, capable of extracting key frames from videos and performing semantic understanding

Model Features

Efficient Video Processing
Uses a grouped resampler to extract 16 frames from each video segment, improving processing efficiency
Multimodal Understanding
Combines vision and language models for deep understanding of video content
Compact Architecture
Lightweight design with only 3B parameters, reducing computational demands while maintaining performance

Model Capabilities

Video Content Understanding
Video-Text Conversion
Multimodal Reasoning
Temporal Information Processing

Use Cases

Video Analysis
Video Content Summarization
Automatically generates textual summaries of video content
Achieved 42.4 points on LongVideoBench
Video Question Answering
Answers various questions about video content
Achieved 47.0 points on Video-MME
Intelligent Surveillance
Anomaly Behavior Detection
Identifies abnormal events in surveillance videos
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase