Model Selection

Image-text understanding

# Image-text understanding

Gemma 3 27b It Qat 8bit

Gemma 3 27B IT QAT 8bit is an MLX-format model converted from Google's Gemma 3 27B model, supporting image-to-text tasks.

Transformers Other

Gemma 3 4b It Qat Autoawq

Gemma 3 is a lightweight open-source multimodal model launched by Google, built on Gemini technology, supporting text and image input and generating text output.

Gemma 3 1b Pt Unsloth Bnb 4bit

Gemma 3 is a series of lightweight open models launched by Google, supporting multimodal input (text and images), with a 128K large context window, suitable for various tasks such as question answering and summarization.

Transformers English

Qwen2.5 VL 7B Instruct GPTQ Int4

Qwen2.5-VL-7B-Instruct-GPTQ-Int4 is an unofficial GPTQ-Int4 quantized version based on the Qwen2.5-VL-7B-Instruct model, supporting multimodal tasks from image-text to text.

Transformers Supports Multiple Languages

GLM-Edge-V-2B is an image-text-to-text model based on the PyTorch framework, supporting Chinese processing.

Florence 2 DocVQA

This is a version of Microsoft's Florence-2 model fine-tuned for 1 day using the Docmatix dataset (5% of the data) with a learning rate of 1e-6

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase