C

Colqwen2.5 V0.1

Developed by vidore
A visual retrieval model based on Qwen2.5-VL-3B-Instruct and ColBERT strategy, capable of generating multi-vector representations for text and images to enable efficient document retrieval.
Downloads 985
Release Time : 1/30/2025

Model Overview

ColQwen2.5 is a vision-language model that efficiently indexes documents through visual features, supports dynamic input image resolution, and is suitable for document retrieval tasks.

Model Features

Dynamic Input Image Resolution
Supports dynamic input image resolution without altering aspect ratio, with a maximum resolution limit of up to 768 image patches.
Multi-Vector Representation
Generates ColBERT-style multi-vector representations for both text and images, enhancing retrieval efficiency.
Efficient Training
Utilizes LoRA adapters and paged_adamw_8bit optimizer, trained with data parallelism on 8 GPUs, learning rate 5e-5, batch size 32.

Model Capabilities

Visual Document Retrieval
Multi-Vector Representation Generation
Dynamic Image Processing

Use Cases

Document Retrieval
Academic Literature Retrieval
Used to retrieve specific content in academic literature, such as data in charts or specific text paragraphs.
Experiments show that increasing the number of image patches significantly improves retrieval performance.
PDF Document Retrieval
Retrieves specific information from PDF documents, such as tables, charts, or text content.
Performs well on the ViDoRe evaluation set, with no overlapping documents from the training set.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase