# Image-text understanding
Gemma 3 27b It Qat 8bit
Other
Gemma 3 27B IT QAT 8bit is an MLX-format model converted from Google's Gemma 3 27B model, supporting image-to-text tasks.
Image-to-Text
Transformers Other

G
mlx-community
422
2
Gemma 3 4b It Qat Autoawq
Gemma 3 is a lightweight open-source multimodal model launched by Google, built on Gemini technology, supporting text and image input and generating text output.
Image-to-Text
Safetensors
G
gaunernst
503
1
Gemma 3 1b Pt Unsloth Bnb 4bit
Gemma 3 is a series of lightweight open models launched by Google, supporting multimodal input (text and images), with a 128K large context window, suitable for various tasks such as question answering and summarization.
Image-to-Text
Transformers English

G
unsloth
4,481
3
Qwen2.5 VL 7B Instruct GPTQ Int4
Apache-2.0
Qwen2.5-VL-7B-Instruct-GPTQ-Int4 is an unofficial GPTQ-Int4 quantized version based on the Qwen2.5-VL-7B-Instruct model, supporting multimodal tasks from image-text to text.
Image-to-Text
Transformers Supports Multiple Languages

Q
hfl
872
3
Glm Edge V 2b
Other
GLM-Edge-V-2B is an image-text-to-text model based on the PyTorch framework, supporting Chinese processing.
Image-to-Text
G
THUDM
23.43k
11
Florence 2 DocVQA
This is a version of Microsoft's Florence-2 model fine-tuned for 1 day using the Docmatix dataset (5% of the data) with a learning rate of 1e-6
Text-to-Image
Transformers

F
HuggingFaceM4
3,096
60
Featured Recommended AI Models