🚀 Question Answering Model
This is a question answering model supporting Vietnamese and English, fine - tuned on multiple datasets for extractive QA tasks.
🚀 Quick Start
You can quickly start using this pre - trained model through the following ways:
Using Hugging Face Pipeline
from transformers import pipeline
model_checkpoint = "nguyenvulebinh/vi-mrc-base"
nlp = pipeline('question-answering', model=model_checkpoint,
tokenizer=model_checkpoint)
QA_input = {
'question': "Bình là chuyên gia về gì ?",
'context': "Bình Nguyễn là một người đam mê với lĩnh vực xử lý ngôn ngữ tự nhiên . Anh nhận chứng chỉ Google Developer Expert năm 2020"
}
res = nlp(QA_input)
print('pipeline: {}'.format(res))
More Accurate Infer Process
from infer import tokenize_function, data_collator, extract_answer
from model.mrc_model import MRCQuestionAnswering
from transformers import AutoTokenizer
model_checkpoint = "nguyenvulebinh/vi-mrc-base"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = MRCQuestionAnswering.from_pretrained(model_checkpoint)
QA_input = {
'question': "Bình được công nhận với danh hiệu gì ?",
'context': "Bình Nguyễn là một người đam mê với lĩnh vực xử lý ngôn ngữ tự nhiên . Anh nhận chứng chỉ Google Developer Expert năm 2020"
}
inputs = [tokenize_function(*QA_input)]
inputs_ids = data_collator(inputs)
outputs = model(**inputs_ids)
answer = extract_answer(inputs, outputs, tokenizer)
print(answer)
✨ Features
- Multi - language Support: Supports both Vietnamese and English for question - answering tasks.
- Fine - tuned on Multiple Datasets: Fine - tuned on a combination of English and Vietnamese datasets, including Squad 2.0, mailong25, UIT - ViQuAD, and MultiLingual Question Answering.
- High - performance: Achieves good evaluation results on the Vietnamese dataset.
📚 Documentation
Model Description
- Language model: XLM - RoBERTa
- Fine - tune: MRCQuestionAnswering
- Language: Vietnamese, English
- Downstream - task: Extractive QA
- Dataset (combine English and Vietnamese):
This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset.
Model |
EM |
F1 |
base |
76.43 |
84.16 |
large |
77.32 |
85.46 |
Model Evaluation
The evaluation results are based on 10% of the Vietnamese dataset, showing the model's performance in terms of Exact Match (EM) and F1 score.
📄 License
This model is released under the cc - by - nc - 4.0
license.
About
Built by Binh Nguyen
For more details, visit the project repository.
