Model Selection

Dynamic resolution processing

# Dynamic resolution processing

Internvl3 1B Pretrained

InternVL3-1B is an advanced multimodal large language model developed by OpenGVLab, which has completed native multimodal pretraining but has not undergone post-training.

Transformers Other

InternVL3-38B is an advanced multimodal large language model that excels in multimodal perception, reasoning, and other capabilities. It shows significant improvements compared to previous models and also expands multimodal capabilities such as tool use and GUI agents.

Transformers Other

Uground V1 72B Preview

Qwen2-VL is the latest iteration of the Qwen-VL model series, featuring full-resolution image understanding, ultra-long video parsing, and multilingual text and image recognition capabilities.

Transformers English

UGround is a powerful GUI visual positioning model trained with a simple recipe, developed in collaboration by OSU NLP Group and Orby AI.

Transformers English

Colqwen2 2b V1.0

A visual retrieval model based on Qwen2-VL-2B-Instruct and ColBERT strategy, capable of generating multi-vector text and image representations

Text-to-Image Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase