Model Selection

Image Text Generation

# Image Text Generation

Mistral Community Pixtral 12b GGUF

This is the quantized version of the pixtral-12b model, quantized using llama.cpp, supporting image-text-to-text tasks.

Vitucano 2b8 V1

ViTucano is the first natively Portuguese pre-trained visual assistant, combining visual understanding and language capabilities, suitable for multimodal tasks such as image captioning and visual question answering.

Transformers Other

GIT is a Transformer decoder-based vision-language model trained with CLIP image tokens and text token conditioning, suitable for tasks like image captioning and visual question answering.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase