tapas-small Open-source Tabular Question Answering Model - Free Support for Table Understanding and Question Answering Tasks

Tapas Small

Developed by google

TAPAS is a Transformer-based table question answering model pre-trained in a self-supervised manner on Wikipedia tables and associated text, supporting table understanding and question answering tasks.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Table Question Answering #Masked Language Modeling #Numerical Reasoning

Downloads 41

Release Time : 3/2/2022

Model Overview

This model learns bidirectional representations of tables and text through masked language modeling and intermediate pre-training, suitable for downstream tasks such as table question answering and table content consistency judgment.

Model Features

Joint Table and Text Understanding

Processes flattened tables and associated text simultaneously through a special input format, learning relational representations between them.

Intermediate Pre-training Enhancement

Additional training phase focuses on numerical reasoning, improving the model's logical judgment capabilities for tabular data.

Dual Position Encoding Scheme

Offers both relative (default) and absolute position encoding versions to accommodate different table structure requirements.

Model Capabilities

Table Content Understanding

Table Question Answering

Table Content Consistency Verification

Numerical Reasoning

Use Cases

Intelligent Document Processing

Financial Statement Analysis

Automatically answers queries about financial statement data

Knowledge Base Systems

Wikipedia Table Question Answering

Answers user questions based on Wikipedia tables

🚀 TAPAS small model

TAPAS small model offers two usable versions. It's a BERT - like transformers model, pre - trained on a large English corpus from Wikipedia. This model can be applied to downstream tasks such as question - answering about tables after fine - tuning.

✨ Features

Two Versions: It has two versions. The default one corresponds to the tapas_inter_masklm_small_reset checkpoint of the original Github repository. The non - default version with revision="no_reset" corresponds to tapas_inter_masklm_small.
Pre - training Objectives: Pretrained with Masked Language Modeling (MLM) and an additional intermediate pre - training step to encourage numerical reasoning on tables.
Self - supervised Learning: Trained on raw tables and associated texts without human labeling, using an automatic process to generate inputs and labels.

📚 Documentation

Model description

TAPAS is a BERT - like transformers model pretrained on a large corpus of English data from Wikipedia in a self - supervised fashion. It was pretrained with two main objectives:

Masked language modeling (MLM): Given a (flattened) table and associated context, the model randomly masks 15% of the words in the input. Then it runs the entire (partially masked) sequence through the model to predict the masked words. This allows the model to learn a bidirectional representation of a table and associated text.
Intermediate pre - training: To encourage numerical reasoning on tables, the model was further pre - trained on a balanced dataset of millions of syntactically created training examples. The model must predict whether a sentence is supported or refuted by the contents of a table.

Intended uses & limitations

You can use the raw model for getting hidden representations about table - question pairs, but it's mostly intended to be fine - tuned on a downstream task such as question answering or sequence classification. Check the model hub for fine - tuned versions.

Training procedure

Preprocessing

The texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000. The model inputs are in the form:

[CLS] Sentence [SEP] Flattened table [SEP]

Pre - training

The model was pre - trained on 32 Cloud TPU v3 cores for 1,000,000 steps with a maximum sequence length of 512 and a batch size of 512. The optimizer used is Adam with a learning rate of 5e - 5 and a warmup ratio of 0.01.

BibTeX entry and citation info

@misc{herzig2020tapas,
      title={TAPAS: Weakly Supervised Table Parsing via Pre-training}, 
      author={Jonathan Herzig and PaweÅ‚ Krzysztof Nowak and Thomas MÃ¼ller and Francesco Piccinno and Julian Martin Eisenschlos},
      year={2020},
      eprint={2004.02349},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

@misc{eisenschlos2020understanding,
      title={Understanding tables with intermediate pre-training}, 
      author={Julian Martin Eisenschlos and Syrine Krichene and Thomas MÃ¼ller},
      year={2020},
      eprint={2010.00571},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

This model is released under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご