đ Closed Book Trivia-QA T5 base
A T5-base model trained on No Context Trivia QA dataset, designed to answer trivia questions from its memory.
đ Quick Start
You can test the model on Trivia Questions from the following websites:
⨠Features
- Trained on Specific Dataset: This is a T5-base model trained on the No Context Trivia QA dataset.
- Closed-Book Answering: The model searches for answers in its memory to respond to trivia - type questions.
- Pretrained on C4: The pretrained model was trained on the Common Crawl (C4) dataset.
- Defined Training Parameters: Trained for 135 epochs with a batch size of 32 and a learning rate of 1e - 3.
- Set Input and Output Lengths:
max_input_length
is set as 25 and max_output_length
is 10.
- Performance Metrics: Attained an EM score of 17 and a Subset Match score of 24.5.
đĻ Installation
No specific installation steps are provided in the original README.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("deep-learning-analytics/triviaqa-t5-base")
model = AutoModelWithLMHead.from_pretrained("deep-learning-analytics/triviaqa-t5-base")
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
text = "Who directed the movie Jaws?"
preprocess_text = text.strip().replace("\n","")
tokenized_text = tokenizer.encode(preprocess_text, return_tensors="pt").to(device)
outs = model.model.generate(
tokenized_text,
max_length=10,
num_beams=2,
early_stopping=True
)
dec = [tokenizer.decode(ids) for ids in outs]
print("Predicted Answer: ", dec)
đ Documentation
We have written a blog post that covers the training procedure. Please find it here.
đ§ Technical Details
This is a T5 - base model trained on the No Context Trivia QA data set. The input to the model is a Trivia type question. The model is tuned to search for the answer in its memory to return it. The pretrained model used here was trained on the Common Crawl (C4) data set. The model was trained for 135 epochs using a batch size of 32 and a learning rate of 1e - 3. max_input_length
is set as 25 and max_output_length
is 10. The model attained an EM score of 17 and a Subset Match score of 24.5.
đ License
No license information is provided in the original README.