🚀 T5 Question Generation and Question Answering
This model is a T5-based solution designed for question generation and answering. It offers a reliable approach to handling text-related tasks, achieving good results on specific datasets.
🚀 Quick Start
from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained("JDBN/t5-base-fr-qg-fquad")
tokenizer = T5Tokenizer.from_pretrained("JDBN/t5-base-fr-qg-fquad")
✨ Features
Model description
This model is a T5 Transformers model (airklizz/t5-base-multi-fr-wiki-news) that was fine-tuned in French on three different tasks:
- Question generation
- Question answering
- Answer extraction
It achieves quite good results on the FQuAD validation dataset.
Intended uses & limitations
This model is designed for the three tasks mentioned earlier and has not been tested on other tasks.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained("JDBN/t5-base-fr-qg-fquad")
tokenizer = T5Tokenizer.from_pretrained("JDBN/t5-base-fr-qg-fquad")
📚 Documentation
Training data
The initial model used was https://huggingface.co/airKlizz/t5-base-multi-fr-wiki-news. This model was fine-tuned on a dataset composed of FQuAD and PIAF for the three tasks mentioned previously.
The data were preprocessed as follows:
- Question generation: "generate question: Barack Hussein Obama, né le 4 aout 1961, est un homme politique américain et avocat. Il a été élu en 2009 pour devenir le 44ème président des Etats-Unis d'Amérique."
- Question answering: "question: Quand Barack Hussein Obama a-t-il été élu président des Etats-Unis d’Amérique? context: Barack Hussein Obama, né le 4 aout 1961, est un homme politique américain et avocat. Il a été élu en 2009 pour devenir le 44ème président des Etats-Unis d’Amérique."
- Answer extraction: "extract_answers: Barack Hussein Obama, né le 4 aout 1961, est un homme politique américain et avocat. Il a été élu en 2009 pour devenir le 44ème président des Etats-Unis d’Amérique ."
The preprocessing used was implemented in https://github.com/patil-suraj/question_generation.
Eval results
On FQuAD validation set
Property |
Details |
BLEU_1 |
0.290 |
BLEU_2 |
0.203 |
BLEU_3 |
0.149 |
BLEU_4 |
0.111 |
METEOR |
0.197 |
ROUGE_L |
0.284 |
CIDEr |
1.038 |
Question Answering metrics
For these metrics, the performance of this question answering model (https://huggingface.co/illuin/camembert-base-fquad) on FQuAD original questions and on T5-generated questions are compared.
Questions |
Exact Match |
F1 Score |
Original FQuAD |
54.015 |
77.466 |
Generated |
45.765 |
67.306 |
BibTeX entry and citation info
@misc{githubPatil,
author = {Patil Suraj},
title = {question generation GitHub repository},
year = {2020},
howpublished={\url{https://github.com/patil-suraj/question_generation}}
}
@article{T5,
title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
author={Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
year={2019},
eprint={1910.10683},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{dhoffschmidt2020fquad,
title={FQuAD: French Question Answering Dataset},
author={Martin d'Hoffschmidt and Wacim Belblidia and Tom Brendlé and Quentin Heinrich and Maxime Vidal},
year={2020},
eprint={2002.06071},
archivePrefix={arXiv},
primaryClass={cs.CL}
}