C

Clip Flant5 Xl

Developed by zhiqiulin
A visual-language generation model fine-tuned for image-text retrieval tasks, improved based on google/flan-t5-xl
Downloads 13.44k
Release Time : 12/13/2023

Model Overview

This model is a fine-tuned version of google/flan-t5-xl, mainly used for image and text retrieval tasks, and is demonstrated in the VQAScore paper.

Model Features

Visual-language generation ability
Perform cross-modal retrieval and generation by combining image and text information
Fine-tuned based on Flan-T5-XL
Adapt to visual tasks based on a powerful language model
Open-source license
Uses the Apache-2.0 license, allowing commercial and research use

Model Capabilities

Image-text matching
Cross-modal retrieval
Visual question answering (VQA) related tasks

Use Cases

Information retrieval
Image search
Retrieve relevant images based on text descriptions
Text search
Retrieve relevant text descriptions based on image content
Auxiliary research
Visual question answering research
Used for VQAScore related research
Application effects demonstrated in the paper
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase