🚀 TAPAS large model fine-tuned on WikiSQL (in a supervised fashion)
This model offers two usable versions. The default version corresponds to the tapas_wikisql_sqa_inter_masklm_large_reset
checkpoint in the original Github repository. It was pre-trained on MLM and an additional step called intermediate pre-training by the authors. Subsequently, it was fine-tuned sequentially on SQA and WikiSQL. It employs relative position embeddings (i.e., resetting the position index at each cell of the table).
The other (non-default) version available is:
no_reset
, corresponding to tapas_wikisql_sqa_inter_masklm_large
(intermediate pre-training, absolute position embeddings).
Disclaimer: The team releasing TAPAS did not create a model card for this model. This model card was written by the Hugging Face team and contributors.
✨ Features
- Two versions available for different use - cases.
- Pre - trained on MLM and intermediate pre - training for better performance.
- Fine - tuned on SQA and WikiSQL.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model description
TAPAS is a BERT - like transformers model pretrained on a large corpus of English data from Wikipedia in a self - supervised fashion. This means it was pretrained on raw tables and associated texts without any human labeling. An automatic process was used to generate inputs and labels from those texts, allowing it to utilize a large amount of publicly available data. More precisely, it was pretrained with two objectives:
- Masked language modeling (MLM): Given a (flattened) table and associated context, the model randomly masks 15% of the words in the input. Then, it runs the entire (partially masked) sequence through the model. The model must predict the masked words. This is different from traditional recurrent neural networks (RNNs) that typically process words one by one or autoregressive models like GPT that internally mask future tokens. It enables the model to learn a bidirectional representation of a table and associated text.
- Intermediate pre - training: To encourage numerical reasoning on tables, the authors additionally pre - trained the model by creating a balanced dataset of millions of syntactically created training examples. Here, the model must predict (classify) whether a sentence is supported or refuted by the contents of a table. The training examples are created based on synthetic as well as counterfactual statements.
This way, the model learns an inner representation of the English language used in tables and associated texts, which can then be used to extract features useful for downstream tasks such as answering questions about a table or determining whether a sentence is entailed or refuted by the contents of a table. Fine - tuning is done by adding a cell selection head and aggregation head on top of the pre - trained model, and then jointly training these randomly initialized classification heads with the base model on SQA and WikiSQL.
Intended uses & limitations
You can use this model for answering questions related to a table. For code examples, we refer to the documentation of TAPAS on the HuggingFace website.
Training procedure
Preprocessing
The texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000. The inputs of the model are then of the form:
[CLS] Question [SEP] Flattened table [SEP]
The authors first converted the WikiSQL dataset into the format of SQA using automatic conversion scripts.
Fine - tuning
The model was fine - tuned on 32 Cloud TPU v3 cores for 50,000 steps with a maximum sequence length of 512 and a batch size of 512. In this setup, fine - tuning takes around 10 hours. The optimizer used is Adam with a learning rate of 6.17164e - 5 and a warm - up ratio of 0.1424. See the paper for more details (tables 11 and 12).
BibTeX entry and citation info
@misc{herzig2020tapas,
title={TAPAS: Weakly Supervised Table Parsing via Pre-training},
author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
year={2020},
eprint={2004.02349},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
@misc{eisenschlos2020understanding,
title={Understanding tables with intermediate pre-training},
author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
year={2020},
eprint={2010.00571},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@article{DBLP:journals/corr/abs-1709-00103,
author = {Victor Zhong and
Caiming Xiong and
Richard Socher},
title = {Seq2SQL: Generating Structured Queries from Natural Language using
Reinforcement Learning},
journal = {CoRR},
volume = {abs/1709.00103},
year = {2017},
url = {http://arxiv.org/abs/1709.00103},
archivePrefix = {arXiv},
eprint = {1709.00103},
timestamp = {Mon, 13 Aug 2018 16:48:41 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1709-00103.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
📄 License
The model is released under the Apache - 2.0 license.
Property |
Details |
Model Type |
TAPAS large model fine - tuned on WikiSQL |
Training Data |
wikisql |
License |
apache - 2.0 |