I

Internvideo2 5 Chat 8B

Developed by OpenGVLab
InternVideo2.5 is a video multimodal large language model enhanced by Long and Rich Context (LRC) modeling, built upon InternVL2.5. It significantly improves existing MLLM models by enhancing the ability to perceive fine-grained details and capture long-term temporal structures.
Downloads 8,265
Release Time : 1/22/2025

Model Overview

InternVideo2.5 is a video multimodal large language model that significantly improves the ability to perceive fine-grained details and capture long-term temporal structures through Direct Preference Optimization (TPO) for dense visual task annotation and Adaptive Hierarchical Token Compression (HiCo) for compact spatio-temporal representation.

Model Features

Long and Rich Context (LRC) modeling
Significantly improves existing MLLM models by enhancing the ability to perceive fine-grained details and capture long-term temporal structures
Direct Preference Optimization (TPO)
Enhances model performance through dense visual task annotation
Adaptive Hierarchical Token Compression (HiCo)
Achieves compact spatio-temporal representation and improves processing efficiency

Model Capabilities

Video content understanding
Video text generation
Multimodal reasoning
Long video processing

Use Cases

Video content analysis
Video content description
Detailed description of video content
Generates accurate and detailed video description text
Video question answering
Answering questions about video content
Accurately answers questions about characters, actions, scenes, etc. in the video
Long video processing
Long video summarization
Summarizing long video content
Generates concise summaries of long video content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase