I

Internvl 2 5 HiCo R16

Developed by FriendliAI
InternVideo2.5 is a video multimodal large language model (MLLM) enhanced by long and rich context (LRC) modeling, built upon InternVL2.5.
Downloads 129
Release Time : 3/18/2025

Model Overview

InternVideo2.5 significantly improves existing MLLMs by enhancing the ability to perceive fine-grained details and capture long-term temporal structures. This is achieved through direct preference optimization (TPO) for dense visual task annotation and compact spatiotemporal representation via adaptive hierarchical token compression (HiCo).

Model Features

Long and Rich Context (LRC) Modeling
Significantly improves existing MLLMs by enhancing the ability to perceive fine-grained details and capture long-term temporal structures.
Adaptive Hierarchical Token Compression (HiCo)
Achieves compact spatiotemporal representation using only 16 tokens per frame.
Direct Preference Optimization (TPO)
Enhances model performance through dense visual task annotation.

Model Capabilities

Video Understanding
Video Caption Generation
Multimodal Dialogue
Long Video Analysis

Use Cases

Video Content Analysis
Detailed Video Description
Provides detailed descriptions of video content, including scenes, actions, and objects.
Generates detailed video description text
Video Question Answering
Answers specific questions about video content.
Accurately answers video-related questions
Long Video Processing
Long Video Summarization
Summarizes long video content and extracts key frames.
Generates video summaries and key frame descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase