I

Instructcir Llava Phi35 Clip224 Lp

Developed by uta-smile
InstructCIR is an instruction-aware contrastive learning-based compositional image retrieval model, utilizing ViT-L-224 and Phi-3.5-Mini architectures, focusing on image-text-to-text generation tasks.
Downloads 15
Release Time : 12/16/2024

Model Overview

This model achieves compositional image retrieval through instruction-aware contrastive learning, enabling relevant image retrieval based on textual instructions, suitable for multimodal information retrieval scenarios.

Model Features

Instruction-aware Contrastive Learning
Employs instruction-aware contrastive learning methods to enhance the model's understanding of complex instructions.
Compositional Image Retrieval
Capable of handling compositional queries for more precise image retrieval.
Multimodal Architecture
Combines vision Transformer and language models to achieve cross-modal understanding of images and text.

Model Capabilities

Image retrieval
Text generation
Multimodal understanding
Instruction following

Use Cases

E-commerce
Product Image Retrieval
Retrieve relevant product images based on user descriptions
Improves product search accuracy
Content Management
Media Library Retrieval
Retrieve images from media libraries based on complex descriptions
Enhances content management efficiency
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase