đ BART-LARGE finetuned on SQuADv1
This is a bart-large model finetuned on the SQuADv1 dataset for the question answering task, which can effectively handle question answering scenarios.
đ Quick Start
The BART-LARGE model finetuned on SQuADv1 is designed for question answering tasks. It can process sequences up to 1024 tokens and is suitable for various natural language understanding and generation needs.
⨠Features
- Powerful Architecture: BART is a seq2seq model suitable for both NLG and NLU tasks, proposed in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.
- Effective for QA: By feeding the complete document into the encoder and decoder and using the top hidden state of the decoder as a word representation, it can classify tokens for question answering.
- Long Sequence Handling: It can handle sequences with up to 1024 tokens.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import BartTokenizer, BartForQuestionAnswering
import torch
tokenizer = BartTokenizer.from_pretrained('valhalla/bart-large-finetuned-squadv1')
model = BartForQuestionAnswering.from_pretrained('valhalla/bart-large-finetuned-squadv1')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
encoding = tokenizer(question, text, return_tensors='pt')
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
start_scores, end_scores = model(input_ids, attention_mask=attention_mask, output_attentions=False)[:2]
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])
answer = tokenizer.convert_tokens_to_ids(answer.split())
answer = tokenizer.decode(answer)
#answer => 'a nice puppet'
Advanced Usage
There is no advanced usage example in the original document, so this part is not added.
đ Documentation
Model details
BART was proposed in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. BART is a seq2seq model intended for both NLG and NLU tasks.
To use BART for question answering tasks, we feed the complete document into the encoder and decoder, and use the top hidden state of the decoder as a representation for each word. This representation is used to classify the token. As given in the paper, bart-large achieves comparable results to ROBERTa on SQuAD. Another notable thing about BART is that it can handle sequences with up to 1024 tokens.
Property |
Details |
encoder layers |
12 |
decoder layers |
12 |
hidden size |
4096 |
num attention heads |
16 |
on disk size |
1.63GB |
Model training
This model was trained on a google colab v100 GPU.
You can find the fine-tuning colab here
.
Results
The results are actually slightly worse than given in the paper.
In the paper, the authors mentioned that bart-large achieves 88.8 EM and 94.6 F1
Metric |
Details |
EM |
86.8022 |
F1 |
92.7342 |
đ§ Technical Details
The technical details mainly involve using the complete document as input to the encoder and decoder, and using the top hidden state of the decoder for token classification. BART can handle sequences up to 1024 tokens, which is suitable for various natural language processing tasks.
đ License
No license information is provided in the original document, so this section is skipped.
Created with â¤ī¸ by Suraj Patil
