AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Vision-language generation

# Vision-language generation

Clip Flant5 Xxl
Apache-2.0
A vision-language generation model fine-tuned based on google/flan-t5-xxl, specifically designed for image-text retrieval tasks
Image-to-Text Transformers English
C
zhiqiulin
86.23k
2
Blip2 Opt 6.7b Coco
MIT
BLIP-2 is a vision-language model that combines an image encoder with a large language model for image-to-text generation and visual question answering tasks.
Image-to-Text Transformers English
B
Salesforce
216.79k
33
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase