R

Ret OpenCLIP ViT G 14

Developed by aimagelab
ReT is an innovative method supporting multimodal query and document retrieval, achieving fine-grained retrieval by integrating multi-level representations from visual and textual backbone networks.
Downloads 77
Release Time : 3/25/2025

Model Overview

ReT employs Transformer-based recurrent units and Sigmoid gating mechanisms, supporting mixed image and text inputs for visual document retrieval tasks.

Model Features

Multi-level Feature Integration
Unlike traditional methods that only use the final layer features, ReT integrates multi-level representations from visual and textual backbone networks.
Sigmoid Gating Mechanism
A gating mechanism inspired by LSTM, selectively regulating information flow across levels and modalities.
Hybrid Modality Processing
Capable of independently processing image, text, or mixed-modality queries and document inputs.

Model Capabilities

Multimodal Document Retrieval
Image-Text Joint Feature Extraction
Fine-grained Similarity Calculation

Use Cases

Information Retrieval
Visual Question Answering Document Retrieval
Retrieve relevant documents containing answers based on question text and reference images.
Validated effectiveness on a customized M2KR benchmark.
Cross-modal Retrieval
Use text queries to retrieve relevant image documents or use image queries to retrieve relevant text documents.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase