Tapas-mini-finetuned-wtq Open-source Table Question Answering Model - Achieve Precise Table Question Answering for Free

Tapas Mini Finetuned Wtq

Developed by google

This model is a mini version based on the TAPAS architecture, specifically fine-tuned for the WikiTable Questions (WTQ) dataset for table question answering tasks.

Question Answering System

Transformers

EnglishOpen Source License:Apache-2.0 #Table Question Answering #Weakly Supervised Learning #Numerical Reasoning

Downloads 35

Release Time : 3/2/2022

Model Overview

TAPAS is a BERT-like Transformer model pre-trained on English Wikipedia table data in a self-supervised manner, capable of handling table-related question answering tasks.

Model Features

Relative Position Embedding

The model resets position indices for each cell in the table, which helps better understand the table structure.

Chain Fine-tuning

The model is fine-tuned sequentially on SQA, WikiSQL, and WTQ datasets, enhancing its table comprehension capabilities.

Intermediate Pre-training

Synthetic data is used to enhance numerical reasoning abilities, enabling the model to better handle numerical relationships in tables.

Model Capabilities

Table Understanding

Table Question Answering

Numerical Reasoning

Use Cases

Data Analysis

Table Information Query

Extract answers to specific questions from structured tables

Achieved 28.54% accuracy on the WTQ development set

🚀 TAPAS mini model fine-tuned on WikiTable Questions (WTQ)

This TAPAS mini model, fine-tuned on WikiTable Questions (WTQ), offers two usable versions. It's designed to handle table-related question - answering tasks, leveraging pre - training and fine - tuning on multiple datasets.

✨ Features

Two Model Versions: The default version corresponds to the tapas_wtq_wikisql_sqa_inter_masklm_mini_reset checkpoint of the original Github repository. The non - default no_reset version corresponds to tapas_wtq_wikisql_sqa_inter_masklm_mini (intermediate pre - training, absolute position embeddings).
Pre - training and Fine - tuning: Pre - trained on MLM and intermediate pre - training, then fine - tuned in a chain on SQA, WikiSQL and WTQ.
Relative Position Embeddings: The default version uses relative position embeddings, resetting the position index at every cell of the table.

📚 Documentation

Results

Size	Reset	Dev Accuracy	Link
LARGE	noreset	0.5062	tapas-large-finetuned-wtq (with absolute pos embeddings)
LARGE	reset	0.5097	tapas-large-finetuned-wtq
BASE	noreset	0.4525	tapas-base-finetuned-wtq (with absolute pos embeddings)
BASE	reset	0.4638	tapas-base-finetuned-wtq
MEDIUM	noreset	0.4324	tapas-medium-finetuned-wtq (with absolute pos embeddings)
MEDIUM	reset	0.4324	tapas-medium-finetuned-wtq
MINI	noreset	0.2783	tapas-mini-finetuned-wtq (with absolute pos embeddings)
MINI	reset	0.2854	tapas-mini-finetuned-wtq
TINY	noreset	0.0823	tapas-tiny-finetuned-wtq (with absolute pos embeddings)
TINY	reset	0.1039	tapas-tiny-finetuned-wtq

Model description

TAPAS is a BERT - like transformers model pretrained on a large corpus of English data from Wikipedia in a self - supervised fashion. It was pretrained on raw tables and associated texts with an automatic process to generate inputs and labels. The pre - training objectives are:

Masked language modeling (MLM): The model randomly masks 15% of the words in the input (a flattened table and associated context), then predicts the masked words. This allows it to learn a bidirectional representation of a table and associated text.
Intermediate pre - training: To encourage numerical reasoning on tables, the model was additionally pre - trained on a balanced dataset of millions of syntactically created training examples. It must predict whether a sentence is supported or refuted by the contents of a table.

Fine - tuning is done by adding a cell selection head and aggregation head on top of the pre - trained model, and jointly training these heads with the base model on SQa, WikiSQL and finally WTQ.

Intended uses & limitations

You can use this model for answering questions related to a table. For code examples, refer to the documentation of TAPAS on the HuggingFace website.

Training procedure

Preprocessing

The texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000. The model inputs are of the form:

[CLS] Question [SEP] Flattened table [SEP]

The authors first converted the WTQ dataset into the format of SQA using automatic conversion scripts.

Fine - tuning

The model was fine - tuned on 32 Cloud TPU v3 cores for 50,000 steps with a maximum sequence length of 512 and a batch size of 512. Fine - tuning takes around 10 hours. The optimizer used is Adam with a learning rate of 1.93581e - 5 and a warmup ratio of 0.128960. An inductive bias is added such that the model only selects cells of the same column, reflected by the select_one_column parameter of TapasConfig. See the paper (tables 11 and 12) for more details.

BibTeX entry and citation info

@misc{herzig2020tapas,
      title={TAPAS: Weakly Supervised Table Parsing via Pre-training}, 
      author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
      year={2020},
      eprint={2004.02349},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

@misc{eisenschlos2020understanding,
      title={Understanding tables with intermediate pre-training}, 
      author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
      year={2020},
      eprint={2010.00571},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@article{DBLP:journals/corr/PasupatL15,
  author    = {Panupong Pasupat and
               Percy Liang},
  title     = {Compositional Semantic Parsing on Semi-Structured Tables},
  journal   = {CoRR},
  volume    = {abs/1508.00305},
  year      = {2015},
  url       = {http://arxiv.org/abs/1508.00305},
  archivePrefix = {arXiv},
  eprint    = {1508.00305},
  timestamp = {Mon, 13 Aug 2018 16:47:37 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/PasupatL15.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご