MBERT-Bengali-TydiQA-QA Open Source Question Answering Model - Free Support for Answering Bengali Questions

Home

Mbert Bengali Tydiqa Qa

Developed by sagorsarker

Bengali QA model fine-tuned on mBERT using the tydiqa dataset

Question Answering System

Transformers

OtherOpen Source License:MIT #Bengali QA #Multilingual BERT #Low-resource optimization

Downloads 34

Release Time : 3/2/2022

Model Overview

This model is a Bengali-optimized question answering system based on the multilingual BERT architecture, specifically designed to extract answers from given texts.

Model Features

Multilingual Support

Based on the mBERT architecture, supporting multiple languages including Bengali

Specialized Optimization

Specifically fine-tuned for Bengali question answering tasks

Easy Integration

Provides the bntransformer library to simplify usage

Model Capabilities

Bengali text comprehension

Answer extraction

Context analysis

Use Cases

Education

Historical figure information query

Extract relevant information about specific figures from historical texts

Successfully extracted the name of a figure's father in the example

Information Retrieval

Document QA system

Build an automated question answering system based on Bengali documents

🚀 mBERT Bengali Question Answering

mBERT-Bengali-Tydiqa-QA is a question answering model that fine-tunes the bert-base-multilingual-uncased model with the tydiqa Bengali datasets. It provides an effective solution for Bengali question answering tasks.

🚀 Quick Start

You can use bntransformer to interact with this question answering model.

📦 Installation

pip install bntransformer

💻 Usage Examples

Basic Usage

from bntransformer import BanglaQA

bnqa = BanglaQA()
# you can custom model path or other bengali huggingface model path
# default it takes "sagorsarker/mbert-bengali-tydiqa-qa"
context = "সূর্য সেন ১৮৯৪ সালের ২২ মার্চ চট্টগ্রামের রাউজান থানার নোয়াপাড়ায় অর্থনৈতিক ভাবে অস্বচ্ছল পরিবারে জন্মগ্রহণ করেন। তাঁর পিতার নাম রাজমনি সেন এবং মাতার নাম শশী বালা সেন। রাজমনি সেনের দুই ছেলে আর চার মেয়ে। সূর্য সেন তাঁদের পরিবারের চতুর্থ সন্তান। দুই ছেলের নাম সূর্য ও কমল। চার মেয়ের নাম বরদাসুন্দরী, সাবিত্রী, ভানুমতী ও প্রমিলা। শৈশবে পিতা মাতাকে হারানো সূর্য সেন কাকা গৌরমনি সেনের কাছে মানুষ হয়েছেন। সূর্য সেন ছেলেবেলা থেকেই খুব মনোযোগী ভাল ছাত্র ছিলেন এবং ধর্মভাবাপন্ন গম্ভীর প্রকৃতির ছিলেন।"
question = "মাস্টারদা সূর্যকুমার সেনের বাবার নাম কী ছিল ?"

answers = bnqa.find_answer(context, question)
print(answers)

Advanced Usage

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "sagorsarker/mbert-bengali-tydiqa-qa"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
qa_input = {
    'question': 'মাস্টারদা সূর্যকুমার সেনের বাবার নাম কী ছিল ?',
    'context': 'সূর্য সেন ১৮৯৪ সালের ২২ মার্চ চট্টগ্রামের রাউজান থানার নোয়াপাড়ায় অর্থনৈতিক ভাবে অস্বচ্ছল পরিবারে জন্মগ্রহণ করেন। তাঁর পিতার নাম রাজমনি সেন এবং মাতার নাম শশী বালা সেন। রাজমনি সেনের দুই ছেলে আর চার মেয়ে। সূর্য সেন তাঁদের পরিবারের চতুর্থ সন্তান। দুই ছেলের নাম সূর্য ও কমল। চার মেয়ের নাম বরদাসুন্দরী, সাবিত্রী, ভানুমতী ও প্রমিলা। শৈশবে পিতা মাতাকে হারানো সূর্য সেন কাকা গৌরমনি সেনের কাছে মানুষ হয়েছেন। সূর্য সেন ছেলেবেলা থেকেই খুব মনোযোগী ভাল ছাত্র ছিলেন এবং ধর্মভাবাপন্ন গম্ভীর প্রকৃতির ছিলেন।'
}
result = nlp(qa_input)
print(result)

🔧 Technical Details

The mBERT-Bengali-Tydiqa-QA model is built on the bert-base-multilingual-uncased model.
It is trained with the tydiqa Bengali datasets.
The Tydiqa Bengali data includes 2390 training samples and 113 validation samples.
The model is trained on a kaggle GPU.
It is trained for a total of 5 epochs.
The training process uses the transformers/example/question-aswering notebook with all default settings, except for the pre-trained model and datasets.

📚 Documentation

Evaluation Results

Here are the training evaluation results:

Exact Match: 57.52212389380531
F1 Score: 68.66183963529096

📄 License

This project is licensed under the MIT license.

👥 Authors

Sagor Sarker
- Github
- LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご