I

Internvl3 38B Hf

Developed by OpenGVLab
InternVL3-38B is an advanced multimodal large language model (MLLM) with significant improvements in multimodal perception and reasoning abilities, supporting areas such as tool use, GUI agents, industrial image analysis, and 3D visual perception.
Downloads 2,226
Release Time : 4/18/2025

Model Overview

InternVL3-38B is a multimodal large language model that supports the joint processing of images, videos, and text and has powerful multimodal reasoning abilities.

Model Features

Advanced multimodal capabilities
Compared with previous models, there are significant improvements in multimodal perception and reasoning abilities, supporting areas such as tool use, GUI agents, industrial image analysis, and 3D visual perception.
Efficient batch reasoning
As a native Transformers model, it supports the implementation of multiple attention mechanisms (including SDPA and FA2) and can efficiently process batch inputs containing images, videos, and text.
Multilingual support
Supports multiple languages and is suitable for users in different regions.

Model Capabilities

Image description generation
Video content understanding
Multimodal reasoning
Tool use
GUI agent
Industrial image analysis
3D visual perception
Text generation

Use Cases

Image understanding
Image description generation
Generate a detailed description of the input image.
Generate accurate and detailed image descriptions.
Video understanding
Video content analysis
Analyze and describe the content of the input video.
Accurately identify actions and content in the video.
Multimodal interaction
Multimodal chat
Supports the joint input and interaction of images, videos, and text.
Enable natural multimodal conversations.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase