🚀 TAPAS base model fine-tuned on Sequential Question Answering (SQA)
This model is a fine-tuned version of the TAPAS base model on the Sequential Question Answering (SQA) task. It offers 4 usable versions, with the latest one being the default, corresponding to the tapas_sqa_inter_masklm_base_reset
checkpoint in the original Github repository. It was pre-trained on MLM and an additional intermediate pre-training step, then fine-tuned on SQA. By default, it uses relative position embeddings.
✨ Features
- Multiple Versions: Besides the default version, there are 3 other non - default versions:
revision="v3"
corresponding to tapas_sqa_inter_masklm_base
(intermediate pre - training, absolute position embeddings).
revision="V2"
corresponding to tapas_sqa_masklm_base_reset
(no intermediate pre - training, relative position embeddings).
revision="v1"
corresponding to tapas_sqa_masklm_base
(no intermediate pre - training, absolute position embeddings).
- Self - Supervised Pretraining: TAPAS is a BERT - like transformers model pretrained on a large corpus of English Wikipedia data in a self - supervised way, using an automatic process to generate inputs and labels from raw tables and associated texts.
- Two Pretraining Objectives:
- Masked language modeling (MLM): Randomly masks 15% of the words in the input, allowing the model to learn a bidirectional representation of a table and associated text.
- Intermediate pre - training: Encourages numerical reasoning on tables by creating a balanced dataset of millions of syntactically created training examples.
📚 Documentation
Model description
TAPAS is a BERT - like transformers model pretrained on a large corpus of English data from Wikipedia in a self - supervised fashion. It was pretrained on raw tables and associated texts without human labeling, using an automatic process to generate inputs and labels. Specifically, it was pretrained with two objectives:
- Masked language modeling (MLM): Given a (flattened) table and associated context, the model randomly masks 15% of the words in the input. Then it runs the entire (partially masked) sequence through the model and predicts the masked words. This enables the model to learn a bidirectional representation of a table and associated text, different from traditional RNNs or autoregressive models like GPT.
- Intermediate pre - training: To promote numerical reasoning on tables, the authors pre - trained the model by creating a balanced dataset of millions of syntactically created training examples. The model has to predict whether a sentence is supported or refuted by the contents of a table.
Intended uses & limitations
You can use this model for answering questions related to a table in a conversational set - up. For code examples, refer to the documentation of TAPAS on the HuggingFace website.
Training procedure
Preprocessing
The texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000. The model inputs are in the form:
[CLS] Question [SEP] Flattened table [SEP]
Fine - tuning
The model was fine - tuned on 32 Cloud TPU v3 cores for 200,000 steps with a maximum sequence length of 512 and a batch size of 128. Fine - tuning takes around 20 hours in this setup. The optimizer used is Adam with a learning rate of 1.25e - 5 and a warmup ratio of 0.2. An inductive bias is added so that the model only selects cells of the same column, reflected by the select_one_column
parameter of TapasConfig
. See also table 12 of the original paper.
BibTeX entry and citation info
@misc{herzig2020tapas,
title={TAPAS: Weakly Supervised Table Parsing via Pre-training},
author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
year={2020},
eprint={2004.02349},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
@misc{eisenschlos2020understanding,
title={Understanding tables with intermediate pre-training},
author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
year={2020},
eprint={2010.00571},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@InProceedings{iyyer2017search-based,
author = {Iyyer, Mohit and Yih, Scott Wen-tau and Chang, Ming-Wei},
title = {Search-based Neural Structured Learning for Sequential Question Answering},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
year = {2017},
month = {July},
abstract = {Recent work in semantic parsing for question answering has focused on long and complicated questions, many of which would seem unnatural if asked in a normal conversation between two humans. In an effort to explore a conversational QA setting, we present a more realistic task: answering sequences of simple but inter-related questions. We collect a dataset of 6,066 question sequences that inquire about semi-structured tables from Wikipedia, with 17,553 question-answer pairs in total. To solve this sequential question answering task, we propose a novel dynamic neural semantic parsing framework trained using a weakly supervised reward-guided search. Our model effectively leverages the sequential context to outperform state-of-the-art QA systems that are designed to answer highly complex questions.},
publisher = {Association for Computational Linguistics},
url = {https://www.microsoft.com/en-us/research/publication/search-based-neural-structured-learning-sequential-question-answering/},
}
📄 License
This model is released under the apache - 2.0 license.