Model Selection

Multimodal large language model

# Multimodal large language model

Internvl3 8B Hf

InternVL3 is an advanced multimodal large language model series with powerful multimodal perception and reasoning capabilities, supporting image, video, and text inputs.

Transformers Other

MiniMax-VL-01 is a powerful multimodal large language model that adopts the 'ViT-MLP-LLM' framework with dynamic resolution processing capabilities, demonstrating excellent performance in various vision-language tasks.

Llava UHD V2 Vicuna 7B

LLaVA-UHD v2 is an advanced multimodal large language model built around a hierarchical window transformer, capable of capturing different visual granularities through a high-resolution feature pyramid.

Multimodal Fusion

Auroracap 7B VID Xtuner

AuroraCap is a multimodal large language model for image and video captioning, focusing on efficient and detailed video caption generation.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase