# Visual Grounding
Qwen2.5 VL 3B Instruct 4bit
Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring enhanced visual understanding, agent capabilities, and long video processing.
Text-to-Image
Transformers English

Q
jarvisvasu
174
3
VARCO VISION 14B
VARCO-VISION-14B is a powerful English-Korean Vision-Language Model (VLM) that supports image and text input, generates text output, and possesses capabilities for grounding, referencing, and OCR.
Image-to-Text
Transformers Supports Multiple Languages

V
NCSOFT
1,022
28
Cogvlm Grounding Generalist Hf Quant4
Apache-2.0
CogVLM is a powerful open-source vision-language model supporting tasks like object detection and visual question answering, featuring 4-bit precision quantization.
Image-to-Text
Transformers

C
Rodeszones
50
9
Featured Recommended AI Models