đ BigBird base trivia-itc
This model is a fine - tuned checkpoint of bigbird - roberta - base
. It is fine - tuned on trivia_qa
with BigBirdForQuestionAnsweringHead
on top. It provides an effective solution for question - answering tasks.
Check out this to see how well google/bigbird-base-trivia-itc
performs on question answering.
đ Quick Start
This section will guide you on how to use the model effectively.
⨠Features
- Based on the fine - tuned
bigbird - roberta - base
model.
- Fine - tuned on the
trivia_qa
dataset for question - answering tasks.
- Allows for different attention types, block sizes, and numbers of random blocks.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import BigBirdForQuestionAnswering
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc")
question = "Replace me by any text you'd like."
context = "Put some context for answering"
encoded_input = tokenizer(question, context, return_tensors='pt')
output = model(**encoded_input)
Advanced Usage
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", attention_type="original_full")
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", block_size=16, num_random_blocks=2)
question = "Replace me by any text you'd like."
context = "Put some context for answering"
encoded_input = tokenizer(question, context, return_tensors='pt')
output = model(**encoded_input)
đ Documentation
Fine - tuning config & hyper - parameters
Property |
Details |
No. of global token |
128 |
Window length |
192 |
No. of random token |
192 |
Max. sequence length |
4096 |
No. of heads |
12 |
No. of hidden layers |
12 |
Hidden layer size |
768 |
Batch size |
32 |
Loss |
cross - entropy noisy spans |
đ§ Technical Details
The model is a fine - tuned version of bigbird - roberta - base
on the trivia_qa
dataset. It uses BigBirdForQuestionAnsweringHead
for question - answering tasks. Different attention types, block sizes, and numbers of random blocks can be configured to optimize performance.
đ License
This project is licensed under the Apache - 2.0 license.
BibTeX entry and citation info
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}