# Efficient Visual Question Answering
Qwen2.5 VL 3B Instruct GPTQ Int3
Apache-2.0
The GPTQ-Int3 quantized version of Qwen2.5-VL-3B-Instruct, suitable for multimodal image-text processing tasks with reduced VRAM usage and faster inference speed.
Image-to-Text
Transformers Supports Multiple Languages

Q
hfl
60
1
Nanollava 1.5
Apache-2.0
nanoLLaVA-1.5 is a vision-language model with under 1 billion parameters, designed specifically for edge devices—compact yet powerful.
Image-to-Text
Transformers English

N
qnguyen3
442
109
Imp V1.5 4B Phi3
Apache-2.0
Imp-v1.5-4B-Phi3 is a high-performance lightweight multimodal large model with only 4 billion parameters, built on the Phi-3 framework and SigLIP visual encoder.
Text-to-Image
Transformers

I
MILVLG
140
7
Featured Recommended AI Models