🚀 GPT2-Small-Arabic
This project presents a GPT2 model trained on the Arabic Wikipedia dataset, leveraging the gpt2-small architecture with the Fastai2 library. It offers text and poetry generation capabilities, though it has certain limitations.
🚀 Quick Start
An example is provided in this colab notebook. Both text and poetry (fine-tuned model) generation are included.
✨ Features
- Generate both text and poetry using a fine - tuned model.
- Utilize the power of GPT2 architecture on Arabic language data.
📦 Installation
No specific installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model description
The GPT2 model is based on the gpt2 - small architecture and is trained on the Arabic Wikipedia dataset using Fastai2.
Intended uses & limitations
How to use
Refer to the colab notebook for usage examples.
Limitations and bias
GPT2 - small - arabic (trained on Arabic Wikipedia) has several limitations in terms of coverage (Arabic Wikipedia quality, no diacritics) and training performance. It should be used as a demonstration or proof of concept rather than production code.
Training data
This pretrained model used the Arabic Wikipedia dump (around 900 MB).
Training procedure
Training was done using the Fastai2 library on Kaggle, using a free GPU.
Eval results
The final perplexity reached was 72.19, loss: 4.28, and accuracy: 0.307.
Information Table
Property |
Details |
Model Type |
GPT2 model based on gpt2 - small, trained on Arabic Wikipedia dataset using Fastai2 |
Training Data |
Arabic Wikipedia dump (around 900 MB) |
BibTeX entry and citation info
@inproceedings{Abed Khooli,
year={2020}
}
⚠️ Important Note
GPT2 - small - arabic has limitations in terms of coverage and training performance. Use it as a demonstration or proof of concept, not as production code.
💡 Usage Tip
Refer to the provided colab notebook for practical usage examples of text and poetry generation.