🚀 TAPAS medium model
This model offers two usable versions. The latest and default version corresponds to the tapas_inter_masklm_medium_reset
checkpoint in the original Github repository. It was pre - trained on MLM and an additional step called intermediate pre - training by the authors, and uses relative position embeddings by default (resetting the position index at every cell of the table).
The other (non - default) version uses absolute position embeddings: revision="no_reset"
, corresponding to tapas_inter_masklm_medium
.
Disclaimer: The team releasing TAPAS didn't write a model card for this model. This model card was written by the Hugging Face team and contributors.
✨ Features
- Two - version availability: Offers both relative and absolute position embedding versions.
- Self - supervised pre - training: Pretrained on a large English Wikipedia corpus in a self - supervised way.
- Dual pre - training objectives: Trained on Masked Language Modeling (MLM) and intermediate pre - training for numerical reasoning.
🚀 Quick Start
This model can be used to obtain hidden representations of table - question pairs in its raw form. However, it is mainly designed to be fine - tuned for downstream tasks like question answering or sequence classification. You can search for fine - tuned versions on the model hub.
📚 Documentation
Model description
TAPAS is a BERT - like transformers model. It was pretrained on a large corpus of English Wikipedia data in a self - supervised manner, using only raw tables and associated texts without human labeling. It has two pre - training objectives:
- Masked language modeling (MLM): Given a (flattened) table and associated context, the model randomly masks 15% of the input words and then predicts them. This allows the model to learn a bidirectional representation of tables and associated texts.
- Intermediate pre - training: To promote numerical reasoning on tables, the model was further pre - trained on a balanced dataset of millions of syntactically created training examples. The model must predict whether a sentence is supported or refuted by the table contents.
Intended uses & limitations
The raw model can be used for getting hidden representations of table - question pairs. But it's mostly for fine - tuning on downstream tasks. Check the model hub for fine - tuned versions.
Training procedure
Preprocessing
The texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000. The model inputs are in the form:
[CLS] Sentence [SEP] Flattened table [SEP]
Pre - training
The model was pre - trained on 32 Cloud TPU v3 cores for 1,000,000 steps, with a maximum sequence length of 512 and a batch size of 512. MLM pre - training alone takes about 3 days. It was also pre - trained on a second task (table entailment). For more details, refer to the original TAPAS paper and the follow - up paper. The optimizer used is Adam with a learning rate of 5e - 5 and a warmup ratio of 0.01.
BibTeX entry and citation info
@misc{herzig2020tapas,
title={TAPAS: Weakly Supervised Table Parsing via Pre-training},
author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
year={2020},
eprint={2004.02349},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
@misc{eisenschlos2020understanding,
title={Understanding tables with intermediate pre-training},
author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
year={2020},
eprint={2010.00571},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
📄 License
This model is licensed under the apache - 2.0
license.
Property |
Details |
Model Type |
TAPAS medium model with two versions (relative and absolute position embeddings) |
Training Data |
A large corpus of English data from Wikipedia |