Colqwenstella 2b Multilingual
A multilingual visual retriever combining Qwen2 vision model with stella_en_1.5B_v5, ranked first among models with ≤2B parameters in Vidore benchmark
Downloads 175
Release Time : 2/11/2025
Model Overview
A multilingual visual document retrieval model integrating Qwen2's vision component with stella_en_1.5B_v5 as embedding model, supporting multiple languages and cross-modal retrieval tasks
Model Features
Multilingual Support
Supports visual document retrieval in five languages: English, French, Spanish, Italian, and German
Efficient Training
Utilizes LoRA technology for parameter-efficient fine-tuning, enabling efficient training on 4xA100 GPUs
High Performance
Ranked first among models with ≤2B parameters and eighth overall in Vidore benchmark
Multimodal Fusion
Combines vision model with text embedding model to achieve cross-modal retrieval capability
Model Capabilities
Multilingual text understanding
Visual document analysis
Cross-modal retrieval
Multimodal embedding
Multilingual embedding
Use Cases
Document Retrieval
Cross-language Document Retrieval
Retrieve relevant visual documents using queries in different languages
Excellent performance in Vidore benchmark
Visual Question Answering System
Document image-based Q&A system
Enterprise Applications
Enterprise Knowledge Base Retrieval
Retrieve relevant visual content from corporate document libraries
Featured Recommended AI Models