Model Selection

Efficient Visual Question Answering

# Efficient Visual Question Answering

Qwen2.5 VL 3B Instruct GPTQ Int3

The GPTQ-Int3 quantized version of Qwen2.5-VL-3B-Instruct, suitable for multimodal image-text processing tasks with reduced VRAM usage and faster inference speed.

Transformers Supports Multiple Languages

nanoLLaVA-1.5 is a vision-language model with under 1 billion parameters, designed specifically for edge devices—compact yet powerful.

Transformers English

Imp V1.5 4B Phi3

Imp-v1.5-4B-Phi3 is a high-performance lightweight multimodal large model with only 4 billion parameters, built on the Phi-3 framework and SigLIP visual encoder.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase