I

Internvl3 1B Pretrained

Developed by OpenGVLab
InternVL3-1B is an advanced multimodal large language model developed by OpenGVLab, which has completed native multimodal pretraining but has not undergone post-training.
Downloads 18
Release Time : 4/17/2025

Model Overview

InternVL3-1B is a multimodal large language model based on the InternViT and Qwen2.5 architectures, supporting joint understanding and generation tasks for images and text.

Model Features

Native multimodal pretraining
Adopts a unified training scheme to synchronously learn language and multimodal representations, enhancing visual-language task processing capabilities.
Variable visual position encoding (V2PE)
Improves long-context understanding by flexibly processing visual tokens with incremental position encoding.
Dynamic resolution processing
Supports 448ร—448 pixel tile segmentation to accommodate inputs of varying sizes.

Model Capabilities

Image understanding
Text generation
Multimodal reasoning
Multilingual support
Multi-image processing
Video understanding

Use Cases

Visual question answering
Image caption generation
Generates detailed descriptions based on input images.
Multimodal dialogue
Image-based dialogue system
Supports multi-turn dialogue interactions based on images.
Featured Recommended AI Models
ยฉ 2025AIbase