🚀 Arabic BERT Model
AraBERTMo is an Arabic pre - trained language model based on Google's BERT architecture, offering multiple variants and achieving good results in tasks like Fill - Mask.
🚀 Quick Start
You can use this model by installing torch
or tensorflow
and Huggingface library transformers
. And you can use it directly by initializing it like this:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Ebtihal/AraBertMo_base_V2")
model = AutoModelForMaskedLM.from_pretrained("Ebtihal/AraBertMo_base_V2")
✨ Features
- Multiple Variants: AraBERTMo_base now comes in 10 new variants.
- Available on HuggingFace: All models are available on the
HuggingFace
model page under the Ebtihal name.
- PyTorch Format: Checkpoints are available in PyTorch formats.
📦 Installation
To use this model, you need to install torch
or tensorflow
and the Huggingface library transformers
.
📚 Documentation
Pretraining Corpus
The AraBertMo_base_V2
model was pre - trained on ~3 million words:
- OSCAR - Arabic version "unshuffled_deduplicated_ar".
Training Results
This model achieves the following results:
Task |
Num examples |
Num Epochs |
Batch Size |
steps |
Wall time |
training loss |
Fill - Mask |
20020 |
2 |
64 |
626 |
19m 2s |
8.437 |
Model Usage
You can use this model by initializing it as shown in the code example in the "Quick Start" section.
Research Background
This model was built for master's degree research in an organization:
💻 Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Ebtihal/AraBertMo_base_V2")
model = AutoModelForMaskedLM.from_pretrained("Ebtihal/AraBertMo_base_V2")
📄 License
No license information provided in the original document.
Additional Information
- Language: Arabic
- Tags: Fill - Mask
- Datasets: OSCAR
- Widget Examples:
- " السلام عليكم ورحمة[MASK] وبركاتة"
- " اهلا وسهلا بكم في [MASK] من سيربح المليون"
- " مرحبا بك عزيزي الزائر [MASK] موقعنا "