I

Internvl3 1B AWQ

Developed by OpenGVLab
InternVL3-1B is a multimodal large language model in the InternVL3 series, featuring exceptional multimodal perception and reasoning capabilities.
Downloads 303
Release Time : 4/17/2025

Model Overview

InternVL3-1B is an advanced multimodal large language model (MLLM) that demonstrates outstanding overall performance, with superior multimodal perception and reasoning abilities, supporting tool usage, GUI agents, industrial image analysis, 3D visual perception, and more.

Model Features

Native Multimodal Pretraining
Integrates language and visual learning into a single pretraining phase, enhancing the ability to handle multimodal tasks.
Variable Visual Position Encoding (V2PE)
Uses smaller, more flexible position increments to encode visual tokens, improving long-context understanding.
Mixed Preference Optimization (MPO)
Enhances the model's reasoning performance through additional supervision from both positive and negative samples.

Model Capabilities

Multimodal Reasoning
OCR
Chart Understanding
Document Understanding
Multi-Image Understanding
Video Understanding
GUI Localization
Spatial Reasoning

Use Cases

Industrial Image Analysis
Industrial Defect Detection
Detects defects in industrial products through image analysis.
High-precision defect identification
3D Visual Perception
3D Scene Understanding
Understands and analyzes objects and relationships in 3D scenes.
Improved 3D scene understanding
Featured Recommended AI Models
ยฉ 2025AIbase