đ Arabic CLIP
An adaptation of CLIP for the Arabic language, aiming to improve the understanding and interpretation of visual information in the Arabic context.
đ Quick Start
Arabic CLIP is an adaptation of the Contrastive Language - Image Pre - training (CLIP) for the Arabic language. CLIP, developed by OpenAI, learns conceptual concepts from images and relates them with textual descriptions. This work tries to enhance the model's understanding and interpretation of visual information in the Arabic language context.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, FlaxVisionTextDualEncoderModel
model = FlaxVisionTextDualEncoderModel.from_pretrained("LinaAlhuri/Arabic-clip-vit-base-patch32", logit_scale_init_value=1,from_pt=True)
model.save_pretrained("arabic_clip")
tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic", cache_dir=None, use_fast=True)
đ Documentation
đĻ Training Data
The goal was to create a comprehensive Arabic image - text dataset by combining multiple data sources due to the scarcity of Arabic resources. Challenges included limited Arabic data and the quality of translated datasets. The approach involved merging genuine datasets for rich information and using translated datasets to cover diverse domains, scenarios, and objects, balancing their respective pros and cons.
Dataset name |
Images |
Arabic Conceptual Captions |
1,427,210 |
Arabic COCO 2014 |
414,113 |
Arabic WIT |
109,366 |
Arabic Flicker8K |
24,272 |
Proposed (WAP) dataset |
151,252 |
Total |
2,126,213 |
đ§ Technical Details
Performance and Limitations
We have tested the efficacy of Arabic CLIP across different benchmarks tailored for tasks like zero - shot learning, image retrieval, localization, and image search.
- Conceptual Captions
- COCO
- ImageNet
- Unsplash
Zero - shot Learning
Multilingual CLIP |
Top 1 |
Top 5 |
Top 10 |
Top 100 |
Short translation |
10.10 |
21.99 |
26.70 |
47.57 |
Long translation |
9.518 |
20.942 |
25.54 |
45.59 |
Arabic Baseline Patch 32 |
Top 1 |
Top 5 |
Top 10 |
Top 100 |
Short translation |
17.58 |
37.15 |
45.60 |
73.02 |
Long translation |
16.94 |
37.12 |
45.44 |
72.94 |
Image Retrieval
Conceptual Captions Evaluation
Metric |
MCLIP |
Baseline Patch 32 |
MRR@1 |
0.064 |
0.165 |
MRR@5 |
0.093 |
0.231 |
MRR@10 |
0.100 |
0.244 |
COCO Evaluation
Metric |
MCLIP |
Baseline Patch 32 |
MRR@1 |
0.043 |
0.082 |
MRR@5 |
0.068 |
0.127 |
MRR@10 |
0.074 |
0.138 |
Limitations
- Arabic CLIP struggles to count after 3.
- Limited genuine samples for the Arabic language.
- Various noises and biases might be introduced into Arabic CLIP because no studies have been conducted yet to address this issue in the published Arabic dataset or Arabic language models.
Bias
For gender bias, it is important to note that Arabic uses a two - gender system in which all nouns are classified as masculine or feminine. However, this is not the case for English. Translating the text from English to Arabic may result in information loss or even make it prone to gender bias.