# Vision-Language Large Model
Qwen.qwen2.5 VL 32B Instruct GGUF
Qwen2.5-VL-32B-Instruct is a 32B-parameter-scale multimodal vision-language model that supports joint understanding and generation tasks for images and text.
Image-to-Text
Q
DevQuasar
27.50k
1
Cephalo LaTeX Phi 3 Vision 128k 4b Beta
Apache-2.0
Cephalo is a series of vision-language large models focused on multimodal materials science. The current version specializes in converting mathematical formula images into LaTeX code.
Image-to-Text
Transformers

C
lamm-mit
16
0
Somelvlm
Apache-2.0
SoMeLVLM is a large-scale vision-language model designed for social media processing.
Multimodal Fusion
Transformers English

S
Lishi0905
25
2
Cogvlm Chat Hf
Apache-2.0
CogVLM is a powerful open-source vision-language model that achieves leading performance in multiple cross-modal benchmarks
Text-to-Image
Transformers English

C
THUDM
4,816
193
Featured Recommended AI Models