InstructCIR Open-source Image Retrieval Model - Free Deployment for Image and Text to Text Generation

Instructcir Llava Phi35 Clip224 Lp

Developed by uta-smile

InstructCIR is an instruction-aware contrastive learning-based compositional image retrieval model, utilizing ViT-L-224 and Phi-3.5-Mini architectures, focusing on image-text-to-text generation tasks.

Image-to-Text

PyTorch

Open Source License:Apache-2.0 #Instruction-aware Image Retrieval #Compositional Image Search #Contrastive Learning Optimization

Downloads 15

Release Time : 12/16/2024

Model Overview

This model achieves compositional image retrieval through instruction-aware contrastive learning, enabling relevant image retrieval based on textual instructions, suitable for multimodal information retrieval scenarios.

Model Features

Instruction-aware Contrastive Learning

Employs instruction-aware contrastive learning methods to enhance the model's understanding of complex instructions.

Compositional Image Retrieval

Capable of handling compositional queries for more precise image retrieval.

Multimodal Architecture

Combines vision Transformer and language models to achieve cross-modal understanding of images and text.

Model Capabilities

Image retrieval

Text generation

Multimodal understanding

Instruction following

Use Cases

E-commerce

Product Image Retrieval

Retrieve relevant product images based on user descriptions

Improves product search accuracy

Content Management

Media Library Retrieval

Retrieve images from media libraries based on complex descriptions

Enhances content management efficiency

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Instructcir Llava Phi35 Clip224 Lp

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 InstructCIR

🚀 Quick Start

📄 License

📚 Documentation

Metrics

Pipeline Tag

Related Paper