I

Internvit 6B 448px V2 5

Developed by OpenGVLab
InternViT-6B-448px-V2_5 is a major upgrade based on InternViT-6B-448px-V1-5, enhancing visual feature extraction capabilities through ViT incremental learning and NTP loss, particularly excelling in handling complex scenarios like multilingual OCR data and mathematical charts.
Downloads 711
Release Time : 11/22/2024

Model Overview

This model is a powerful visual feature extractor, adopting the ViT-MLP-LLM architecture, supporting dynamic high-resolution processing of single images, multiple images, and video data, suitable for building multimodal large language models (MLLM).

Model Features

ViT Incremental Learning
Through Phase 1.5 incremental pre-training, significantly improves feature extraction capabilities in rare domains (such as multilingual OCR and mathematical charts).
Dynamic High-Resolution Processing
Supports flexible processing of single images, multiple images, and video data, with the maximum block count n_max dynamically allocated to different input types.
Multimodal Support
Retains the same architecture as InternVL 1.5 and 2.0, integrating incrementally pre-trained InternViT with multiple LLMs, suitable for building MLLM.

Model Capabilities

Image Feature Extraction
Multimodal Alignment
Dynamic Resolution Processing
Multi-Image Analysis
Video Frame Processing

Use Cases

Multimodal Applications
Multilingual OCR
Handles underrepresented multilingual text recognition in web datasets.
Enhances feature extraction capabilities in multilingual scenarios.
Mathematical Chart Understanding
Parses complex mathematical formulas and charts.
Strengthens visual representation capabilities in specialized domains.
Computer Vision
Image Classification
Performs image classification on datasets like ImageNet.
Demonstrates excellent performance on IN-1K validation set and multiple variants.
Semantic Segmentation
Conducts semantic segmentation on ADE20K and COCO-Stuff-164K.
Supports three configurations: linear probing, head tuning, and full tuning.
Featured Recommended AI Models
ยฉ 2025AIbase