L

Llava NeXT Video 7B DPO

Developed by lmms-lab
LLaVA-Next-Video is an open-source multimodal dialogue model, fine-tuned with multimodal instruction-following data on large language models, supporting video and text multimodal interactions.
Downloads 8,049
Release Time : 4/16/2024

Model Overview

LLaVA-Next-Video is a multimodal dialogue model based on Vicuna-7B, focusing on video and text multimodal interactions, suitable for research and development of multimodal dialogue systems.

Model Features

Multimodal Interaction
Supports multimodal input of video and text, capable of generating text responses related to video content.
Instruction Following
Fine-tuned with multimodal instruction-following data, capable of understanding and executing complex multimodal instructions.
Open-source Model
Fully open-source, facilitating secondary development and customization by researchers and developers.

Model Capabilities

Video content understanding
Multimodal dialogue generation
Instruction following
Video question answering

Use Cases

Research
Multimodal Dialogue System Research
Used for researching and developing multimodal dialogue systems, exploring interactions between video and text.
Education
Video Content Question Answering
Used in educational settings to generate Q&A and explanations based on video content.
Featured Recommended AI Models
ยฉ 2025AIbase