I

Internvl 2 5 HiCo R64

Developed by OpenGVLab
A video multimodal large language model enhanced by Long and Rich Context (LRC) modeling, improving existing MLLMs by enhancing the perception of fine-grained details and capturing long-term temporal structures
Downloads 252
Release Time : 1/23/2025

Model Overview

InternVideo2.5 is a video multimodal large language model that enhances perception and spatiotemporal representation through Direct Preference Optimization (TPO) and Adaptive Hierarchical Token Compression (HiCo) techniques.

Model Features

Long and Rich Context Modeling
Enhances video content understanding through LRC modeling
Adaptive Hierarchical Token Compression
Uses HiCo technology to achieve compact spatiotemporal representation with 64 tokens per frame
Direct Preference Optimization
Improves model performance through TPO for dense visual task annotation

Model Capabilities

Video Content Understanding
Multimodal Reasoning
Long Video Analysis
Video Caption Generation

Use Cases

Video Understanding
Video Content Description
Provides detailed descriptions of video content
Can generate accurate textual descriptions of video content
Video Question Answering
Answers questions about video content
Can accurately answer video-related questions
Long Video Analysis
Long Video Summarization
Summarizes content from long videos
Can effectively capture key information from long videos
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase