Model Selection

Vision-language generation

# Vision-language generation

Clip Flant5 Xxl

A vision-language generation model fine-tuned based on google/flan-t5-xxl, specifically designed for image-text retrieval tasks

Transformers English

Blip2 Opt 6.7b Coco

BLIP-2 is a vision-language model that combines an image encoder with a large language model for image-to-text generation and visual question answering tasks.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase