L

Llava NeXT Video 7B DPO Hf

Developed by llava-hf
LLaVA-NeXT-Video is an open-source multimodal chatbot optimized through mixed training on video and image data, possessing excellent video understanding capabilities.
Downloads 12.61k
Release Time : 6/6/2024

Model Overview

A video understanding model built on LLaVa-NeXT, fine-tuned on mixed video and image data to achieve better video comprehension, supporting multi-visual input and multi-prompt generation.

Model Features

Multimodal Understanding
Supports both image and video input and can comprehend multimodal instructions
Video Understanding Capability
Leads in the VideoMME benchmark, demonstrating outstanding video analysis capabilities
Open-source and Customizable
Fully open-source, supporting 4-bit quantization and Flash-Attention optimization

Model Capabilities

Video content understanding
Image content analysis
Multimodal instruction following
Video question answering
Video content description

Use Cases

Video content analysis
Video content understanding
Analyze video content and answer related questions
Can accurately understand actions, scenes, and events in videos
Educational assistance
Instructional video analysis
Parse instructional video content and answer student questions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase