đ mBERT Bengali Question Answering
mBERT-Bengali-Tydiqa-QA
is a question answering model that fine-tunes the bert-base-multilingual-uncased model with the tydiqa Bengali datasets. It provides an effective solution for Bengali question answering tasks.
đ Quick Start
You can use bntransformer to interact with this question answering model.
đĻ Installation
pip install bntransformer
đģ Usage Examples
Basic Usage
from bntransformer import BanglaQA
bnqa = BanglaQA()
context = "āϏā§āϰā§āϝ āϏā§āύ ā§§ā§Žā§¯ā§Ē āϏāĻžāϞā§āϰ ⧍⧍ āĻŽāĻžāϰā§āĻ āĻāĻā§āĻāĻā§āϰāĻžāĻŽā§āϰ āϰāĻžāĻāĻāĻžāύ āĻĨāĻžāύāĻžāϰ āύā§āϝāĻŧāĻžāĻĒāĻžāĻĄāĻŧāĻžāϝāĻŧ āĻ
āϰā§āĻĨāύā§āϤāĻŋāĻ āĻāĻžāĻŦā§ āĻ
āϏā§āĻŦāĻā§āĻāϞ āĻĒāϰāĻŋāĻŦāĻžāϰ⧠āĻāύā§āĻŽāĻā§āϰāĻšāĻŖ āĻāϰā§āύāĨ¤ āϤāĻžāĻāϰ āĻĒāĻŋāϤāĻžāϰ āύāĻžāĻŽ āϰāĻžāĻāĻŽāύāĻŋ āϏā§āύ āĻāĻŦāĻ āĻŽāĻžāϤāĻžāϰ āύāĻžāĻŽ āĻļāĻļā§ āĻŦāĻžāϞāĻž āϏā§āύāĨ¤ āϰāĻžāĻāĻŽāύāĻŋ āϏā§āύā§āϰ āĻĻā§āĻ āĻā§āϞ⧠āĻāϰ āĻāĻžāϰ āĻŽā§āϝāĻŧā§āĨ¤ āϏā§āϰā§āϝ āϏā§āύ āϤāĻžāĻāĻĻā§āϰ āĻĒāϰāĻŋāĻŦāĻžāϰā§āϰ āĻāϤā§āϰā§āĻĨ āϏāύā§āϤāĻžāύāĨ¤ āĻĻā§āĻ āĻā§āϞā§āϰ āύāĻžāĻŽ āϏā§āϰā§āϝ āĻ āĻāĻŽāϞāĨ¤ āĻāĻžāϰ āĻŽā§āϝāĻŧā§āϰ āύāĻžāĻŽ āĻŦāϰāĻĻāĻžāϏā§āύā§āĻĻāϰā§, āϏāĻžāĻŦāĻŋāϤā§āϰā§, āĻāĻžāύā§āĻŽāϤ⧠āĻ āĻĒā§āϰāĻŽāĻŋāϞāĻžāĨ¤ āĻļā§āĻļāĻŦā§ āĻĒāĻŋāϤāĻž āĻŽāĻžāϤāĻžāĻā§ āĻšāĻžāϰāĻžāύ⧠āϏā§āϰā§āϝ āϏā§āύ āĻāĻžāĻāĻž āĻā§āϰāĻŽāύāĻŋ āϏā§āύā§āϰ āĻāĻžāĻā§ āĻŽāĻžāύā§āώ āĻšāϝāĻŧā§āĻā§āύāĨ¤ āϏā§āϰā§āϝ āϏā§āύ āĻā§āϞā§āĻŦā§āϞāĻž āĻĨā§āĻā§āĻ āĻā§āĻŦ āĻŽāύā§āϝā§āĻā§ āĻāĻžāϞ āĻāĻžāϤā§āϰ āĻāĻŋāϞā§āύ āĻāĻŦāĻ āϧāϰā§āĻŽāĻāĻžāĻŦāĻžāĻĒāύā§āύ āĻāĻŽā§āĻā§āϰ āĻĒā§āϰāĻā§āϤāĻŋāϰ āĻāĻŋāϞā§āύāĨ¤"
question = "āĻŽāĻžāϏā§āĻāĻžāϰāĻĻāĻž āϏā§āϰā§āϝāĻā§āĻŽāĻžāϰ āϏā§āύā§āϰ āĻŦāĻžāĻŦāĻžāϰ āύāĻžāĻŽ āĻā§ āĻāĻŋāϞ ?"
answers = bnqa.find_answer(context, question)
print(answers)
Advanced Usage
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
model_name = "sagorsarker/mbert-bengali-tydiqa-qa"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
qa_input = {
'question': 'āĻŽāĻžāϏā§āĻāĻžāϰāĻĻāĻž āϏā§āϰā§āϝāĻā§āĻŽāĻžāϰ āϏā§āύā§āϰ āĻŦāĻžāĻŦāĻžāϰ āύāĻžāĻŽ āĻā§ āĻāĻŋāϞ ?',
'context': 'āϏā§āϰā§āϝ āϏā§āύ ā§§ā§Žā§¯ā§Ē āϏāĻžāϞā§āϰ ⧍⧍ āĻŽāĻžāϰā§āĻ āĻāĻā§āĻāĻā§āϰāĻžāĻŽā§āϰ āϰāĻžāĻāĻāĻžāύ āĻĨāĻžāύāĻžāϰ āύā§āϝāĻŧāĻžāĻĒāĻžāĻĄāĻŧāĻžāϝāĻŧ āĻ
āϰā§āĻĨāύā§āϤāĻŋāĻ āĻāĻžāĻŦā§ āĻ
āϏā§āĻŦāĻā§āĻāϞ āĻĒāϰāĻŋāĻŦāĻžāϰ⧠āĻāύā§āĻŽāĻā§āϰāĻšāĻŖ āĻāϰā§āύāĨ¤ āϤāĻžāĻāϰ āĻĒāĻŋāϤāĻžāϰ āύāĻžāĻŽ āϰāĻžāĻāĻŽāύāĻŋ āϏā§āύ āĻāĻŦāĻ āĻŽāĻžāϤāĻžāϰ āύāĻžāĻŽ āĻļāĻļā§ āĻŦāĻžāϞāĻž āϏā§āύāĨ¤ āϰāĻžāĻāĻŽāύāĻŋ āϏā§āύā§āϰ āĻĻā§āĻ āĻā§āϞ⧠āĻāϰ āĻāĻžāϰ āĻŽā§āϝāĻŧā§āĨ¤ āϏā§āϰā§āϝ āϏā§āύ āϤāĻžāĻāĻĻā§āϰ āĻĒāϰāĻŋāĻŦāĻžāϰā§āϰ āĻāϤā§āϰā§āĻĨ āϏāύā§āϤāĻžāύāĨ¤ āĻĻā§āĻ āĻā§āϞā§āϰ āύāĻžāĻŽ āϏā§āϰā§āϝ āĻ āĻāĻŽāϞāĨ¤ āĻāĻžāϰ āĻŽā§āϝāĻŧā§āϰ āύāĻžāĻŽ āĻŦāϰāĻĻāĻžāϏā§āύā§āĻĻāϰā§, āϏāĻžāĻŦāĻŋāϤā§āϰā§, āĻāĻžāύā§āĻŽāϤ⧠āĻ āĻĒā§āϰāĻŽāĻŋāϞāĻžāĨ¤ āĻļā§āĻļāĻŦā§ āĻĒāĻŋāϤāĻž āĻŽāĻžāϤāĻžāĻā§ āĻšāĻžāϰāĻžāύ⧠āϏā§āϰā§āϝ āϏā§āύ āĻāĻžāĻāĻž āĻā§āϰāĻŽāύāĻŋ āϏā§āύā§āϰ āĻāĻžāĻā§ āĻŽāĻžāύā§āώ āĻšāϝāĻŧā§āĻā§āύāĨ¤ āϏā§āϰā§āϝ āϏā§āύ āĻā§āϞā§āĻŦā§āϞāĻž āĻĨā§āĻā§āĻ āĻā§āĻŦ āĻŽāύā§āϝā§āĻā§ āĻāĻžāϞ āĻāĻžāϤā§āϰ āĻāĻŋāϞā§āύ āĻāĻŦāĻ āϧāϰā§āĻŽāĻāĻžāĻŦāĻžāĻĒāύā§āύ āĻāĻŽā§āĻā§āϰ āĻĒā§āϰāĻā§āϤāĻŋāϰ āĻāĻŋāϞā§āύāĨ¤'
}
result = nlp(qa_input)
print(result)
đ§ Technical Details
- The
mBERT-Bengali-Tydiqa-QA
model is built on the bert-base-multilingual-uncased model.
- It is trained with the tydiqa Bengali datasets.
- The Tydiqa Bengali data includes 2390 training samples and 113 validation samples.
- The model is trained on a kaggle GPU.
- It is trained for a total of 5 epochs.
- The training process uses the transformers/example/question-aswering notebook with all default settings, except for the pre-trained model and datasets.
đ Documentation
Evaluation Results
Here are the training evaluation results:
Exact Match: 57.52212389380531
F1 Score: 68.66183963529096
đ License
This project is licensed under the MIT license.
đĨ Authors