clip-flant5-xl Open-source Model - Empowering Image-Text Retrieval, Improved from google/flan-t5-xl

Clip Flant5 Xl

Developed by zhiqiulin

A visual-language generation model fine-tuned for image-text retrieval tasks, improved based on google/flan-t5-xl

Downloads 13.44k

Release Time : 12/13/2023

Model Overview

This model is a fine-tuned version of google/flan-t5-xl, mainly used for image and text retrieval tasks, and is demonstrated in the VQAScore paper.

Visual-language generation ability

Perform cross-modal retrieval and generation by combining image and text information

Fine-tuned based on Flan-T5-XL

Adapt to visual tasks based on a powerful language model

Open-source license

Uses the Apache-2.0 license, allowing commercial and research use

Image-text matching

Cross-modal retrieval

Visual question answering (VQA) related tasks

Information retrieval

Image search

Retrieve relevant images based on text descriptions

Text search

Retrieve relevant text descriptions based on image content

Auxiliary research

Visual question answering research

Used for VQAScore related research

Application effects demonstrated in the paper

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base