🚀 TAPAS small model
TAPAS small model offers two usable versions. It's a BERT - like transformers model, pre - trained on a large English corpus from Wikipedia. This model can be applied to downstream tasks such as question - answering about tables after fine - tuning.
✨ Features
- Two Versions: It has two versions. The default one corresponds to the
tapas_inter_masklm_small_reset
checkpoint of the original Github repository. The non - default version with revision="no_reset"
corresponds to tapas_inter_masklm_small
.
- Pre - training Objectives: Pretrained with Masked Language Modeling (MLM) and an additional intermediate pre - training step to encourage numerical reasoning on tables.
- Self - supervised Learning: Trained on raw tables and associated texts without human labeling, using an automatic process to generate inputs and labels.
📚 Documentation
Model description
TAPAS is a BERT - like transformers model pretrained on a large corpus of English data from Wikipedia in a self - supervised fashion. It was pretrained with two main objectives:
- Masked language modeling (MLM): Given a (flattened) table and associated context, the model randomly masks 15% of the words in the input. Then it runs the entire (partially masked) sequence through the model to predict the masked words. This allows the model to learn a bidirectional representation of a table and associated text.
- Intermediate pre - training: To encourage numerical reasoning on tables, the model was further pre - trained on a balanced dataset of millions of syntactically created training examples. The model must predict whether a sentence is supported or refuted by the contents of a table.
Intended uses & limitations
You can use the raw model for getting hidden representations about table - question pairs, but it's mostly intended to be fine - tuned on a downstream task such as question answering or sequence classification. Check the model hub for fine - tuned versions.
Training procedure
Preprocessing
The texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000. The model inputs are in the form:
[CLS] Sentence [SEP] Flattened table [SEP]
Pre - training
The model was pre - trained on 32 Cloud TPU v3 cores for 1,000,000 steps with a maximum sequence length of 512 and a batch size of 512. The optimizer used is Adam with a learning rate of 5e - 5 and a warmup ratio of 0.01.
BibTeX entry and citation info
@misc{herzig2020tapas,
title={TAPAS: Weakly Supervised Table Parsing via Pre-training},
author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
year={2020},
eprint={2004.02349},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
@misc{eisenschlos2020understanding,
title={Understanding tables with intermediate pre-training},
author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
year={2020},
eprint={2010.00571},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
📄 License
This model is released under the apache - 2.0
license.