Arabic-clip-vit-base-patch32 Open-source Model - Realize the Association between Arabic Texts and Image Concepts

Arabic Clip Vit Base Patch32

Developed by LinaAlhuri

Arabic CLIP is an adapted version of the Contrastive Language-Image Pre-training (CLIP) model for Arabic, capable of learning concepts from images and associating them with Arabic text descriptions.

Text-to-Image Arabic#Arabic Visual-Language Understanding #Zero-shot Learning #Multimodal Retrieval

Downloads 33

Release Time : 3/31/2023

Model Overview

This model is an Arabic-adapted version based on the OpenAI CLIP architecture, focusing on improving visual information understanding and interpretation in Arabic contexts.

Model Features

Arabic Adaptation

Specifically optimized for Arabic, addressing data scarcity and translation quality issues in Arabic

Multi-dataset Training

Incorporates over 2 million Arabic image-text pairs, including real datasets and translated datasets

Zero-shot Learning Capability

Supports zero-shot learning and demonstrates excellent performance on multiple Arabic benchmarks

Model Capabilities

Image Understanding

Arabic Text-Image Association

Zero-shot Image Classification

Image Retrieval

Cross-modal Search

Use Cases

Image Retrieval

Arabic Concept Image Retrieval

Retrieve relevant images based on Arabic descriptions

MRR@10 reaches 0.244

Zero-shot Learning

Arabic Image Classification

Classify images directly without training

Top-1 accuracy 17.58%

🚀 Arabic CLIP

An adaptation of CLIP for the Arabic language, aiming to improve the understanding and interpretation of visual information in the Arabic context.

🚀 Quick Start

Arabic CLIP is an adaptation of the Contrastive Language - Image Pre - training (CLIP) for the Arabic language. CLIP, developed by OpenAI, learns conceptual concepts from images and relates them with textual descriptions. This work tries to enhance the model's understanding and interpretation of visual information in the Arabic language context.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, FlaxVisionTextDualEncoderModel
model = FlaxVisionTextDualEncoderModel.from_pretrained("LinaAlhuri/Arabic-clip-vit-base-patch32", logit_scale_init_value=1,from_pt=True)
model.save_pretrained("arabic_clip") 

tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic", cache_dir=None, use_fast=True)

📚 Documentation

📦 Training Data

The goal was to create a comprehensive Arabic image - text dataset by combining multiple data sources due to the scarcity of Arabic resources. Challenges included limited Arabic data and the quality of translated datasets. The approach involved merging genuine datasets for rich information and using translated datasets to cover diverse domains, scenarios, and objects, balancing their respective pros and cons.

Dataset name	Images
Arabic Conceptual Captions	1,427,210
Arabic COCO 2014	414,113
Arabic WIT	109,366
Arabic Flicker8K	24,272
Proposed (WAP) dataset	151,252
Total	2,126,213

🔧 Technical Details

Performance and Limitations

We have tested the efficacy of Arabic CLIP across different benchmarks tailored for tasks like zero - shot learning, image retrieval, localization, and image search.

Conceptual Captions
COCO
ImageNet
Unsplash

Zero - shot Learning

Multilingual CLIP	Top 1	Top 5	Top 10	Top 100
Short translation	10.10	21.99	26.70	47.57
Long translation	9.518	20.942	25.54	45.59

Arabic Baseline Patch 32	Top 1	Top 5	Top 10	Top 100
Short translation	17.58	37.15	45.60	73.02
Long translation	16.94	37.12	45.44	72.94

Image Retrieval

Conceptual Captions Evaluation

Metric	MCLIP	Baseline Patch 32
MRR@1	0.064	0.165
MRR@5	0.093	0.231
MRR@10	0.100	0.244

COCO Evaluation

Metric	MCLIP	Baseline Patch 32
MRR@1	0.043	0.082
MRR@5	0.068	0.127
MRR@10	0.074	0.138

Limitations

Arabic CLIP struggles to count after 3.
Limited genuine samples for the Arabic language.
Various noises and biases might be introduced into Arabic CLIP because no studies have been conducted yet to address this issue in the published Arabic dataset or Arabic language models.

Bias

For gender bias, it is important to note that Arabic uses a two - gender system in which all nouns are classified as masculine or feminine. However, this is not the case for English. Translating the text from English to Arabic may result in information loss or even make it prone to gender bias.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご