I

Internvl3 2B AWQ

Developed by OpenGVLab
InternVL3-2B is an advanced Multimodal Large Language Model (MLLM) developed by OpenGVLab, featuring exceptional multimodal perception and reasoning capabilities, supporting tool usage, GUI agents, industrial image analysis, 3D visual perception, and more.
Downloads 677
Release Time : 4/17/2025

Model Overview

InternVL3-2B is a multimodal large language model that combines vision and language processing capabilities, suitable for various multimodal tasks.

Model Features

Native Multimodal Pretraining
Integrates language and visual learning into a single pretraining phase, enhancing multimodal processing capabilities.
Variable Visual Position Encoding (V2PE)
Uses smaller, more flexible position increments to encode visual tokens, improving long-context understanding.
Mixed Preference Optimization (MPO)
Enhances model reasoning performance through supervision with positive and negative samples.
Test-Time Scaling
Adopts Best-of-N evaluation strategy and VisualPRM-8B as a critic model to optimize reasoning and mathematical evaluation.

Model Capabilities

Multimodal Reasoning
OCR
Chart Understanding
Document Understanding
Multi-Image Understanding
Video Understanding
GUI Localization
Spatial Reasoning

Use Cases

Industrial Image Analysis
Industrial Defect Detection
Detects defects in industrial products through image analysis.
High-precision defect identification
3D Visual Perception
3D Scene Understanding
Understands and analyzes objects and relationships in 3D scenes.
Enhanced 3D scene understanding
GUI Operations
Automated GUI Testing
Understands GUI interfaces and performs automated testing through the model.
Improved testing efficiency
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase