🚀 AraModernBert For Topic Classification
This is an experimental Arabic model that showcases how ModernBERT can be adapted for Arabic in tasks such as topic classification.
🚀 Quick Start
The model can be used for text classification using the transformers
library. Below is an example:
from transformers import pipeline
classifier = pipeline(
task="text-classification",
model="Omartificial-Intelligence-Space/AraModernBert-Topic-Classifier",
)
sample = '''
PUT SOME TEXT HERE TO CLASSIFY ITS TOPIC
'''
classifier(sample)
✨ Features
This is an experimental Arabic version of ModernBERT-base, trained ONLY on Topic Classification Task using the base model of original modernbert with a custom Arabic trained tokenizer. The details are as follows:
- Dataset: Arabic Wikipedia
- Size: 1.8 GB
- Tokens: 228,788,529 tokens
This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
📚 Documentation
Model Eval Details
- Epochs: 3
- Evaluation Metrics:
- F1 Score: 0.95
- Loss: 0.1998
- Training Step: 47,862
Dataset Used For Training
- SANAD DATASET was used for training and testing, which contains 7 different topics such as Politics, Finance, Medical, Culture, Sport, Tech and Religion.
Test Phase Results
- The model was evaluated on a test set of 14181 examples of different topics. The distribution of these topics is:

- The model achieved the following accuracy for prediction on this test set:

📄 License
This project is licensed under the Apache-2.0 license.
📚 Citation
@misc{modernbert,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}
⚠️ Important Note
This is an Experimental Arabic Model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
Property |
Details |
Model Type |
AraModernBert for Topic Classification |
Training Data |
SANAD DATASET |
Base Model |
ModernBERT-base |
Pipeline Tag |
text-classification |
Library Name |
transformers |
Tags |
modernbert, arabic |