L

Llava NeXT Video 7B Hf

Developed by llava-hf
LLaVA-NeXT-Video is an open-source multimodal chatbot that achieves excellent video understanding capabilities through mixed training on video and image data, reaching SOTA level among open-source models on the VideoMME benchmark.
Downloads 65.95k
Release Time : 6/5/2024

Model Overview

A video understanding model based on LLaVA-NeXT, supporting multimodal inputs of images and videos, capable of performing tasks such as visual question answering and content description.

Model Features

Video Understanding Capability
Trained on 100K VideoChatGPT-Instruct data, it possesses excellent video content understanding capabilities
Multimodal Input Support
Supports both images and videos as input, capable of handling complex multimodal queries
Open-source SOTA
Currently the best-performing open-source model on the VideoMME benchmark
Efficient Inference
Supports 4-bit quantization and Flash-Attention 2 optimization, reducing computational resource requirements

Model Capabilities

Video content understanding
Image content analysis
Multimodal question answering
Video content description
Cross-modal reasoning

Use Cases

Content Understanding
Video Content Analysis
Analyze scenes, actions, and events in videos
Accurately describe video content and highlights
Image Question Answering
Answer various questions about image content
Provide accurate explanations of image content
Education
Educational Video Understanding
Parse educational video content to aid learning
Help students understand complex concepts
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase