Model Selection

Vision-Language Large Model

# Vision-Language Large Model

Qwen.qwen2.5 VL 32B Instruct GGUF

Qwen2.5-VL-32B-Instruct is a 32B-parameter-scale multimodal vision-language model that supports joint understanding and generation tasks for images and text.

Cephalo LaTeX Phi 3 Vision 128k 4b Beta

Cephalo is a series of vision-language large models focused on multimodal materials science. The current version specializes in converting mathematical formula images into LaTeX code.

SoMeLVLM is a large-scale vision-language model designed for social media processing.

Multimodal Fusion

Transformers English

CogVLM is a powerful open-source vision-language model that achieves leading performance in multiple cross-modal benchmarks

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase