🚀 AraElectra for Question Answering on Arabic-SQuADv2
This is an AraElectra model fine-tuned on the Arabic-SQuADv2.0 dataset for question answering tasks, including handling unanswerable questions.
🚀 Quick Start
This is the AraElectra model, fine-tuned using the Arabic-SQuADv2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering, with the help of AraElectra Classifier to predict unanswerable questions.
✨ Features
- Language model: AraElectra
- Language: Arabic
- Downstream-task: Extractive QA
- Training data: Arabic-SQuADv2.0
- Eval data: Arabic-SQuADv2.0
- Test data: Arabic-SQuADv2.0
- Code: See More Info on Github
- Infrastructure: 1x Tesla K80
Property |
Details |
Model Type |
AraElectra |
Training Data |
Arabic-SQuADv2.0 |
📦 Installation
No installation steps provided in the original document.
💻 Usage Examples
Basic Usage
For best results use the AraBert preprocessor by aub-mind
from transformers import ElectraForQuestionAnswering, ElectraForSequenceClassification, AutoTokenizer, pipeline
from preprocess import ArabertPreprocessor
prep_object = ArabertPreprocessor("araelectra-base-discriminator")
question = prep_object('ما هي جامعة الدول العربية ؟')
context = prep_object('''
جامعة الدول العربية هيمنظمة إقليمية تضم دولاً عربية في آسيا وأفريقيا.
ينص ميثاقها على التنسيق بين الدول الأعضاء في الشؤون الاقتصادية، ومن ضمنها العلاقات التجارية الاتصالات، العلاقات الثقافية، الجنسيات ووثائق وأذونات السفر والعلاقات الاجتماعية والصحة. المقر الدائم لجامعة الدول العربية يقع في القاهرة، عاصمة مصر (تونس من 1979 إلى 1990).
''')
qa_modelname = 'ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA'
cls_modelname = 'ZeyadAhmed/AraElectra-Arabic-SQuADv2-CLS'
qa_pipe = pipeline('question-answering', model=qa_modelname, tokenizer=qa_modelname)
QA_input = {
'question': question,
'context': context
}
CLS_input = {
'text': question,
'text_pair': context
}
qa_res = qa_pipe(QA_input)
cls_res = cls_pipe(CLS_iput)
threshold = 0.5
qa_model = ElectraForQuestionAnswering.from_pretrained(qa_modelname)
cls_model = ElectraForSequenceClassification.from_pretrained(cls_modelname)
tokenizer = AutoTokenizer.from_pretrained(qa_modelname)
📚 Documentation
Hyperparameters
batch_size = 8
n_epochs = 4
base_LM_model = "AraElectra"
learning_rate = 3e-5
optimizer = AdamW
padding = dynamic
Online Demo on Arabic Wikipedia and User Provided Contexts
See model in action hosted on streamlit 
Performance
Evaluated on the Arabic-SQuAD 2.0 test set with the official eval script except changing in the preprocessing a little to fit the arabic language the modified eval script.
"exact": 65.11555277951281,
"f1": 71.49042547237256,
"total": 9606,
"HasAns_exact": 56.14535768645358,
"HasAns_f1": 67.79623803036668,
"HasAns_total": 5256,
"NoAns_exact": 75.95402298850574,
"NoAns_f1": 75.95402298850574,
"NoAns_total": 4350