I

Internvl 2 5 HiCo R16

Developed by OpenGVLab
InternVideo2.5 is a video multimodal large language model (MLLM) built upon InternVL2.5, enhanced with Long and Rich Context (LRC) modeling, capable of perceiving fine-grained details and capturing long-term temporal structures.
Downloads 1,914
Release Time : 1/23/2025

Model Overview

InternVideo2.5 is a video multimodal large language model that utilizes Direct Preference Optimization (TPO) for dense visual task annotation and achieves compact spatiotemporal representation through Adaptive Hierarchical Token Compression (HiCo), significantly enhancing the capabilities of existing MLLMs.

Model Features

Long and Rich Context (LRC) Modeling
Enhanced with LRC modeling, capable of perceiving fine-grained details and capturing long-term temporal structures.
Adaptive Hierarchical Token Compression (HiCo)
Achieves compact spatiotemporal representation, improving model efficiency.
Direct Preference Optimization (TPO)
Optimizes model performance through dense visual task annotation.

Model Capabilities

Video Understanding
Multimodal Reasoning
Long Video Analysis
Fine-grained Detail Perception

Use Cases

Video Analysis
Video Content Description
Provides detailed descriptions of video content, including scenes, characters, and actions.
High-accuracy video content understanding
Long Video Structure Analysis
Captures temporal structures and key events in long videos.
59.6 accuracy (LongVideoBench)
Multimodal Tasks
Multimodal Reasoning
Performs complex reasoning by combining video and text information.
74.0 accuracy (MVBench)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase