Clip Flant5 Xl
A visual-language generation model fine-tuned for image-text retrieval tasks, improved based on google/flan-t5-xl
Downloads 13.44k
Release Time : 12/13/2023
Model Overview
This model is a fine-tuned version of google/flan-t5-xl, mainly used for image and text retrieval tasks, and is demonstrated in the VQAScore paper.
Model Features
Visual-language generation ability
Perform cross-modal retrieval and generation by combining image and text information
Fine-tuned based on Flan-T5-XL
Adapt to visual tasks based on a powerful language model
Open-source license
Uses the Apache-2.0 license, allowing commercial and research use
Model Capabilities
Image-text matching
Cross-modal retrieval
Visual question answering (VQA) related tasks
Use Cases
Information retrieval
Image search
Retrieve relevant images based on text descriptions
Text search
Retrieve relevant text descriptions based on image content
Auxiliary research
Visual question answering research
Used for VQAScore related research
Application effects demonstrated in the paper
Featured Recommended AI Models
Š 2025AIbase