AraElectra-Arabic-SQuADv2-QA Open-Source Question-Answering System - Accurately Handle Answerable and Unanswerable Questions in Arabic

Araelectra Arabic SQuADv2 QA

Developed by ZeyadAhmed

This is an Arabic question-answering system fine-tuned on the AraElectra model, specifically optimized for the Arabic SQuADv2.0 dataset, capable of handling both answerable and unanswerable questions.

Question Answering System

Transformers

Arabic#Arabic Q&A #Extractive Reading Comprehension #Unanswerable Question Detection

Downloads 469

Release Time : 6/29/2022

Model Overview

The model performs extractive question-answering tasks by training on Arabic question-answer pairs, incorporating a classifier to determine if a question is answerable, making it suitable for Arabic reading comprehension scenarios.

Model Features

Arabic-Specific Processing

Optimizes Arabic text input using the AraBert preprocessor

Unanswerable Question Detection

Incorporates a dedicated classifier to determine if a question is answerable, with adjustable thresholds

Efficient Training

Training completed on a single Tesla K80 GPU with a batch size of 8

Model Capabilities

Arabic Text Understanding

Question-Answer Pair Processing

Unanswerable Question Recognition

Use Cases

Education

Arabic Reading Comprehension Assistance

Helps students understand Arabic texts and answer related questions

Exact Match: 65.12%, F1 Score: 71.49%

Information Retrieval

Arabic Wikipedia Q&A

Extracts answers from Arabic Wikipedia content

Provides an online demo system

🚀 AraElectra for Question Answering on Arabic-SQuADv2

This is an AraElectra model fine-tuned on the Arabic-SQuADv2.0 dataset for question answering tasks, including handling unanswerable questions.

🚀 Quick Start

This is the AraElectra model, fine-tuned using the Arabic-SQuADv2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering, with the help of AraElectra Classifier to predict unanswerable questions.

✨ Features

Language model: AraElectra
Language: Arabic
Downstream-task: Extractive QA
Training data: Arabic-SQuADv2.0
Eval data: Arabic-SQuADv2.0
Test data: Arabic-SQuADv2.0
Code: See More Info on Github
Infrastructure: 1x Tesla K80

Property	Details
Model Type	AraElectra
Training Data	Arabic-SQuADv2.0

📦 Installation

No installation steps provided in the original document.

💻 Usage Examples

Basic Usage

For best results use the AraBert preprocessor by aub-mind

from transformers import ElectraForQuestionAnswering, ElectraForSequenceClassification, AutoTokenizer, pipeline
from preprocess import ArabertPreprocessor
prep_object = ArabertPreprocessor("araelectra-base-discriminator")
question = prep_object('ما هي جامعة الدول العربية ؟')
context = prep_object('''
جامعة الدول العربية هيمنظمة إقليمية تضم دولاً عربية في آسيا وأفريقيا.
ينص ميثاقها على التنسيق بين الدول الأعضاء في الشؤون الاقتصادية، ومن ضمنها العلاقات التجارية الاتصالات، العلاقات الثقافية، الجنسيات ووثائق وأذونات السفر والعلاقات الاجتماعية والصحة. المقر الدائم لجامعة الدول العربية يقع في القاهرة، عاصمة مصر (تونس من 1979 إلى 1990). 
''')
# a) Get predictions
qa_modelname = 'ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA'
cls_modelname = 'ZeyadAhmed/AraElectra-Arabic-SQuADv2-CLS'
qa_pipe = pipeline('question-answering', model=qa_modelname, tokenizer=qa_modelname)
QA_input = {
    'question': question,
    'context': context
}
CLS_input = {
    'text': question,
    'text_pair': context
}
qa_res = qa_pipe(QA_input)
cls_res = cls_pipe(CLS_iput)
threshold = 0.5 #hyperparameter can be tweaked
## note classification results label0 probability it can be answered label1 probability can't be answered 
## if label1 probability > threshold then consider the output of qa_res is empty string else take the qa_res
# b) Load model & tokenizer
qa_model = ElectraForQuestionAnswering.from_pretrained(qa_modelname)
cls_model = ElectraForSequenceClassification.from_pretrained(cls_modelname)
tokenizer = AutoTokenizer.from_pretrained(qa_modelname)

📚 Documentation

Hyperparameters

batch_size = 8
n_epochs = 4
base_LM_model = "AraElectra"
learning_rate = 3e-5
optimizer = AdamW
padding = dynamic

Online Demo on Arabic Wikipedia and User Provided Contexts

See model in action hosted on streamlit

Performance

Evaluated on the Arabic-SQuAD 2.0 test set with the official eval script except changing in the preprocessing a little to fit the arabic language the modified eval script.

"exact": 65.11555277951281,
"f1": 71.49042547237256,

"total": 9606,
"HasAns_exact": 56.14535768645358,
"HasAns_f1": 67.79623803036668,
"HasAns_total": 5256,
"NoAns_exact": 75.95402298850574,
"NoAns_f1": 75.95402298850574,
"NoAns_total": 4350

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご