Instructcir Llava Phi35 Clip224 Lp
I
Instructcir Llava Phi35 Clip224 Lp
Developed by uta-smile
InstructCIR is an instruction-aware contrastive learning-based compositional image retrieval model, utilizing ViT-L-224 and Phi-3.5-Mini architectures, focusing on image-text-to-text generation tasks.
Downloads 15
Release Time : 12/16/2024
Model Overview
This model achieves compositional image retrieval through instruction-aware contrastive learning, enabling relevant image retrieval based on textual instructions, suitable for multimodal information retrieval scenarios.
Model Features
Instruction-aware Contrastive Learning
Employs instruction-aware contrastive learning methods to enhance the model's understanding of complex instructions.
Compositional Image Retrieval
Capable of handling compositional queries for more precise image retrieval.
Multimodal Architecture
Combines vision Transformer and language models to achieve cross-modal understanding of images and text.
Model Capabilities
Image retrieval
Text generation
Multimodal understanding
Instruction following
Use Cases
E-commerce
Product Image Retrieval
Retrieve relevant product images based on user descriptions
Improves product search accuracy
Content Management
Media Library Retrieval
Retrieve images from media libraries based on complex descriptions
Enhances content management efficiency
Featured Recommended AI Models
Š 2025AIbase