# Compact Vision-Language
Llava Gemma 7b
LLaVA-Gemma-7b is a large multimodal model trained based on the LLaVA-v1.5 framework, using google/gemma-7b-it as the language backbone combined with a CLIP visual encoder, suitable for multimodal understanding and generation tasks.
Image-to-Text
Transformers English

L
Intel
161
11
Llava Gemma 2b
LLaVA-Gemma-2b is a large multimodal model trained based on the LLaVA-v1.5 framework, utilizing the 2-billion-parameter Gemma-2b-it as the language backbone combined with a CLIP visual encoder.
Image-to-Text
Transformers English

L
Intel
1,503
44
Featured Recommended AI Models