I

Internvl3 2B Instruct

Developed by OpenGVLab
InternVL3-2B-Instruct is a supervised fine-tuned version based on InternVL3-2B, undergoing native multimodal pretraining and SFT processing, equipped with powerful multimodal perception and reasoning capabilities.
Downloads 1,345
Release Time : 4/16/2025

Model Overview

InternVL3-2B-Instruct is an advanced multimodal large language model, demonstrating exceptional multimodal perception and reasoning capabilities, supporting various tasks such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.

Model Features

Native Multimodal Pretraining
Integrates language and visual learning into a single pretraining phase, enhancing multimodal processing capabilities.
Variable Visual Position Encoding (V2PE)
Uses smaller, more flexible position increments to improve long-context understanding.
Dynamic Resolution Strategy
Divides images into 448ร—448 pixel blocks, supporting multi-image and video data.
Supervised Fine-Tuning
Utilizes high-quality and diverse training data, extending to various tasks such as tool usage and 3D scene understanding.

Model Capabilities

Multimodal Reasoning
OCR Recognition
Chart Understanding
Document Understanding
Multi-Image Understanding
Video Understanding
GUI Localization
Spatial Reasoning
Multilingual Understanding

Use Cases

Industrial Image Analysis
Defect Detection
Identifies defects and anomalies in industrial images.
Improves detection accuracy and efficiency.
3D Visual Perception
3D Scene Understanding
Analyzes and understands objects and relationships in 3D scenes.
Enhances semantic understanding of 3D scenes.
GUI Operation
Automated Testing
Automatically identifies and operates GUI elements.
Improves automation level in GUI testing.
Featured Recommended AI Models
ยฉ 2025AIbase