L

Llava Video 7B Qwen2

Developed by lmms-lab
The LLaVA-Video model is a 7B-parameter multimodal model based on the Qwen2 language model, specializing in video understanding tasks and supporting 64-frame video input.
Downloads 34.28k
Release Time : 9/2/2024

Model Overview

This model is trained on the LLaVA-Video-178K and LLaVA-OneVision datasets, capable of interacting with images, multiple images, and videos, primarily targeting video understanding tasks.

Model Features

Multimodal Video Understanding
Supports processing video input and generating relevant text descriptions or answering questions
Long Context Support
Supports a context window of 32K tokens, capable of handling longer video content
Multi-Frame Processing Capability
Can process up to 64 frames of video input

Model Capabilities

Video Content Understanding
Video Q&A
Video Description Generation
Multimodal Reasoning

Use Cases

Video Understanding
Video Content Description
Generates detailed content descriptions based on input videos
Video Q&A
Answers various questions about video content
Performs excellently on multiple video Q&A datasets
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase