Mexma Siglip2
MEXMA-SigLIP2 is a high-performance CLIP model combining the MEXMA multilingual text encoder and SigLIP2 image encoder, supporting 80 languages.
Downloads 224
Release Time : 3/2/2025
Model Overview
This model integrates the MEXMA multilingual text encoder and SigLIP2 image encoder to achieve cross-modal retrieval capabilities, excelling particularly in zero-shot image classification tasks.
Model Features
Multilingual support
Supports 80 languages, including various Asian, European, and African languages
High-performance cross-modal retrieval
Achieves new state-of-the-art results on the Crossmodal-3600 dataset
Zero-shot learning capability
Performs image classification tasks without task-specific fine-tuning
Model Capabilities
Zero-shot image classification
Cross-modal retrieval
Multilingual text understanding
Image-text matching
Use Cases
Image retrieval
Multilingual image search
Retrieve relevant images using queries in different languages
Achieves 62.54% image retrieval accuracy on the Crossmodal-3600 dataset
Text retrieval
Image-related text retrieval
Retrieve relevant text descriptions based on image content
Achieves 59.99% text retrieval accuracy on the Crossmodal-3600 dataset
Featured Recommended AI Models