AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal large language model

# Multimodal large language model

Internvl3 8B Hf
Other
InternVL3 is an advanced multimodal large language model series with powerful multimodal perception and reasoning capabilities, supporting image, video, and text inputs.
Image-to-Text Transformers Other
I
OpenGVLab
454
1
Minimax VL 01
MiniMax-VL-01 is a powerful multimodal large language model that adopts the 'ViT-MLP-LLM' framework with dynamic resolution processing capabilities, demonstrating excellent performance in various vision-language tasks.
Image-to-Text
M
MiniMaxAI
237
253
Llava UHD V2 Vicuna 7B
LLaVA-UHD v2 is an advanced multimodal large language model built around a hierarchical window transformer, capable of capturing different visual granularities through a high-resolution feature pyramid.
Multimodal Fusion Transformers
L
YipengZhang
103
6
Auroracap 7B VID Xtuner
Apache-2.0
AuroraCap is a multimodal large language model for image and video captioning, focusing on efficient and detailed video caption generation.
Video-to-Text
A
wchai
31
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase