🚀 ALBERT Persian
A Lite BERT for Self-supervised Learning of Language Representations for the Persian Language
ALBERT-Persian is the first attempt to apply ALBERT to the Persian language. Based on Google's ALBERT BASE Version 2.0, this model was trained on over 3.9 million documents, 73 million sentences, and 1.3 billion words from various writing styles and subjects (such as scientific, novels, and news), similar to the approach used for ParsBERT.
Please follow the ALBERT-Persian repository for the latest information on previous and current models.
🚀 Quick Start
Persian Text Classification [DigiMag, Persian News]
The goal of this task is to label texts in a supervised manner for both the DigiMag
and Persian News
datasets.
Persian News
This dataset consists of various news articles scraped from different online news agencies' websites. There are a total of 16,438 articles, divided into eight different classes:
- Economic
- International
- Political
- Science Technology
- Cultural Art
- Sport
- Medical
Label |
# |
Social |
2170 |
Economic |
1564 |
International |
1975 |
Political |
2269 |
Science Technology |
2436 |
Cultural Art |
2558 |
Sport |
1381 |
Medical |
2085 |
Download
You can download the dataset from here
✨ Features
Results
The following table summarizes the F1 score compared to other models and architectures:
Dataset |
ALBERT-fa-base-v2 |
ParsBERT-v1 |
mBERT |
Persian News |
97.01 |
97.19 |
95.79 |
📄 License
This project is licensed under the Apache-2.0 license.
BibTeX entry and citation info
Please cite in publications as the following:
@misc{ALBERTPersian,
author = {Mehrdad Farahani},
title = {ALBERT-Persian: A Lite BERT for Self-supervised Learning of Language Representations for the Persian Language},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/m3hrdadfi/albert-persian}},
}
@article{ParsBERT,
title={ParsBERT: Transformer-based Model for Persian Language Understanding},
author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
journal={ArXiv},
year={2020},
volume={abs/2005.12515}
}
💡 Questions?
Post a Github issue on the ALBERT-Persian repo.