đ flan-t5-base for Extractive QA
This is the flan-t5-base model fine-tuned on the SQuAD2.0 dataset for Extractive Question Answering, capable of handling unanswerable questions.
â ī¸ Important Note
The <cls>
token must be manually added to the beginning of the question for this model to work properly. It uses the <cls>
token to make "no answer" predictions. The t5 tokenizer does not automatically add this special token, so it is added manually.
đĄ Usage Tip
With transformers version 4.31.0, the use_remote_code=True
is no longer necessary.
đ Quick Start
This section provides a quick guide on how to use the sjrhuschlee/flan-t5-base-squad2
model for question-answering tasks.
⨠Features
- Fine-tuned Model: Based on the
flan-t5-base
model, fine-tuned on the SQuAD 2.0 dataset.
- Support for Unanswerable Questions: Capable of making "no answer" predictions using the
<cls>
token.
- Multiple Usage Methods: Can be used with pipelines or by loading the model and tokenizer separately.
đĻ Installation
Ensure you have the necessary libraries installed:
pip install torch transformers
đģ Usage Examples
Basic Usage
import torch
from transformers import(
AutoModelForQuestionAnswering,
AutoTokenizer,
pipeline
)
model_name = "sjrhuschlee/flan-t5-base-squad2"
nlp = pipeline(
'question-answering',
model=model_name,
tokenizer=model_name,
)
qa_input = {
'question': f'{nlp.tokenizer.cls_token}Where do I live?',
'context': 'My name is Sarah and I live in London'
}
res = nlp(qa_input)
model = AutoModelForQuestionAnswering.from_pretrained(
model_name,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
question = f'{tokenizer.cls_token}Where do I live?'
context = 'My name is Sarah and I live in London'
encoding = tokenizer(question, context, return_tensors="pt")
output = model(
encoding["input_ids"],
attention_mask=encoding["attention_mask"]
)
all_tokens = tokenizer.convert_ids_to_tokens(encoding["input_ids"][0].tolist())
answer_tokens = all_tokens[torch.argmax(output["start_logits"]):torch.argmax(output["end_logits"]) + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
đ Documentation
Overview
Property |
Details |
Model Type |
flan-t5-base |
Language |
English |
Downstream-task |
Extractive QA |
Training Data |
SQuAD 2.0 |
Eval Data |
SQuAD 2.0 |
Infrastructure |
1x NVIDIA 3070 |
Metrics
{
"eval_HasAns_exact": 79.97638326585695,
"eval_HasAns_f1": 86.1444296592862,
"eval_HasAns_total": 5928,
"eval_NoAns_exact": 84.42388561816652,
"eval_NoAns_f1": 84.42388561816652,
"eval_NoAns_total": 5945,
"eval_best_exact": 82.2033184536343,
"eval_best_exact_thresh": 0.0,
"eval_best_f1": 85.28292588395921,
"eval_best_f1_thresh": 0.0,
"eval_exact": 82.2033184536343,
"eval_f1": 85.28292588395928,
"eval_runtime": 522.0299,
"eval_samples": 12001,
"eval_samples_per_second": 22.989,
"eval_steps_per_second": 0.96,
"eval_total": 11873
}
{
"eval_exact_match": 86.3197729422895,
"eval_f1": 92.94686836210295,
"eval_runtime": 442.1088,
"eval_samples": 10657,
"eval_samples_per_second": 24.105,
"eval_steps_per_second": 1.007
}
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 6
- total_train_batch_size: 96
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4.0
Framework versions
- Transformers 4.30.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3
đ License
This project is licensed under the MIT License.