# Low video memory optimization
Smolvlm Instruct GGUF
Apache-2.0
SmolVLM is a compact open-source multimodal model that can accept image and text inputs and generate text outputs. It is designed for high efficiency and is suitable for device-side applications.
Image-to-Text
Transformers English

S
Mungert
1,023
2
Llama Joycaption Beta One Hf Llava GGUF
An image captioning vision-language model (VLM) freely open to the community, which can be used to train diffusion models and supports diverse image styles and content.
Image-to-Text
Transformers

L
Mungert
2,968
2
Featured Recommended AI Models