Tapas-base-finetuned-wikisql-supervised Open-source Table Question Answering Model

Tapas Base Finetuned Wikisql Supervised

Developed by google

TAPAS is a BERT-based Transformer model specifically designed for table question answering tasks. It is pre-trained in a self-supervised manner on English Wikipedia table data and supports weakly supervised table parsing.

Question Answering System

Transformers

EnglishOpen Source License:Apache-2.0 #Table Question Answering #Weakly Supervised Pre-training #Numerical Reasoning

Downloads 737

Release Time : 3/2/2022

Model Overview

This model learns bidirectional representations of tables and associated text through masked language modeling and intermediate pre-training, suitable for table-based question answering tasks, supporting cell selection and aggregation operations.

Model Features

Two-stage Pre-training

Combines masked language modeling and intermediate pre-training to enhance numerical reasoning capabilities for tables

Relative Position Embeddings

Resets position indices for each cell in the table to optimize understanding of table structures

Joint Fine-tuning

Jointly trains cell selection heads and aggregation heads on SQA and WikiSQL datasets

Model Capabilities

Table Question Answering

Table Content Parsing

Cell Selection

Numerical Aggregation Calculation

Use Cases

Business Intelligence

Financial Statement Analysis

Automatically answers natural language questions about financial statement data

Can accurately extract specific metrics and perform aggregation operations like summation

Data Querying

Natural Language Interface for Databases

Converts natural language questions into table query operations

Supports SQL-like query functionality without requiring SQL knowledge

🚀 TAPAS base model fine-tuned on WikiSQL (in a supervised fashion)

This model is a fine-tuned version of the TAPAS base model on WikiSQL in a supervised manner. It offers two usable versions, enabling users to handle various table-related tasks.

✨ Features

Two Model Versions: The default version corresponds to the tapas_wikisql_sqa_inter_masklm_base_reset checkpoint of the original Github repository. The non - default no_reset version corresponds to tapas_wikisql_sqa_inter_masklm_base, using absolute position embeddings.
Pretraining and Fine - tuning: Pretrained on MLM and intermediate pre - training, then fine - tuned on SQA and WikiSQL in a chain.
Relative Position Embeddings: The default version uses relative position embeddings, resetting the position index at every cell of the table.

📚 Documentation

Model description

TAPAS is a BERT - like transformers model pretrained on a large corpus of English data from Wikipedia in a self - supervised fashion. It was pretrained with two objectives:

Masked language modeling (MLM): Given a (flattened) table and associated context, the model randomly masks 15% of the words in the input and then predicts the masked words. This allows the model to learn a bidirectional representation of a table and associated text.
Intermediate pre - training: To encourage numerical reasoning on tables, the model was pre - trained on a balanced dataset of millions of syntactically created training examples. It must predict whether a sentence is supported or refuted by the contents of a table.

Fine - tuning is done by adding a cell selection head and aggregation head on top of the pre - trained model and jointly training these heads with the base model on SQA and WikiSQL.

Intended uses & limitations

You can use this model for answering questions related to a table. For code examples, refer to the documentation of TAPAS on the HuggingFace website.

Training procedure

Preprocessing

The texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000. The model inputs are in the form:

[CLS] Question [SEP] Flattened table [SEP]

The authors converted the WikiSQL dataset into the format of SQA using automatic conversion scripts.

Fine - tuning

The model was fine - tuned on 32 Cloud TPU v3 cores for 50,000 steps with a maximum sequence length of 512 and a batch size of 512. Fine - tuning takes around 10 hours. The optimizer used is Adam with a learning rate of 6.17164e - 5 and a warmup ratio of 0.1424. See the paper (tables 11 and 12) for more details.

BibTeX entry and citation info

@misc{herzig2020tapas,
      title={TAPAS: Weakly Supervised Table Parsing via Pre-training}, 
      author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
      year={2020},
      eprint={2004.02349},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

@misc{eisenschlos2020understanding,
      title={Understanding tables with intermediate pre-training}, 
      author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
      year={2020},
      eprint={2010.00571},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@article{DBLP:journals/corr/abs-1709-00103,
  author    = {Victor Zhong and
               Caiming Xiong and
               Richard Socher},
  title     = {Seq2SQL: Generating Structured Queries from Natural Language using
               Reinforcement Learning},
  journal   = {CoRR},
  volume    = {abs/1709.00103},
  year      = {2017},
  url       = {http://arxiv.org/abs/1709.00103},
  archivePrefix = {arXiv},
  eprint    = {1709.00103},
  timestamp = {Mon, 13 Aug 2018 16:48:41 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1709-00103.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

This model is licensed under the apache - 2.0 license.

Property	Details
Model Type	TAPAS base model fine - tuned on WikiSQL
Training Data	WikiSQL, SQA
Tags	tapas

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご