🚀 ReasTAP
ReasTAP是一个表格推理模型,由2022年EMNLP会议的论文ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples提出。原GitHub仓库为 https://github.com/Yale-LILY/ReasTAP。该模型旨在解决表格数据推理问题,通过在预训练阶段注入表格推理技能,提升模型在表格相关任务上的性能。
🚀 快速开始
Yale-LILY/reastap-large-finetuned-wikisql
模型基于 Yale-LILY/reastap-large
进行初始化,并在 WikiSQL 数据集上进行了微调。
💻 使用示例
基础用法
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import pandas as pd
tokenizer = AutoTokenizer.from_pretrained("Yale-LILY/reastap-large-finetuned-wikisql")
model = AutoModelForSeq2SeqLM.from_pretrained("Yale-LILY/reastap-large-finetuned-wikisql")
data = {
"year": [1896, 1900, 1904, 2004, 2008, 2012],
"city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)
query = "In which year did beijing host the Olympic Games?"
encoding = tokenizer(table=table, query=query, return_tensors="pt")
outputs = model.generate(**encoding)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
📚 详细文档
参考引用
@inproceedings{zhao-etal-2022-reastap,
title = "{R}eas{TAP}: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples",
author = "Zhao, Yilun and
Nan, Linyong and
Qi, Zhenting and
Zhang, Rui and
Radev, Dragomir",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.615",
pages = "9006--9018",
abstract = "Reasoning over tabular data requires both table structure understanding and a broad set of table reasoning skills. Current models with table-specific architectures and pre-training methods perform well on understanding table structures, but they still struggle with tasks that require various table reasoning skills. In this work, we develop ReasTAP to show that high-level table reasoning skills can be injected into models during pre-training without a complex table-specific architecture design. We define 7 table reasoning skills, such as numerical operation, temporal comparison, and conjunction. Each reasoning skill is associated with one example generator, which synthesizes questions over semi-structured tables according to the sampled templates. We model the table pre-training task as a sequence generation task and pre-train ReasTAP to generate precise answers of the synthetic examples. ReasTAP is evaluated on four benchmarks covering three downstream tasks including 1) WikiSQL-Weak and WikiTQ for Table Question Answering, 2) TabFact for Table Fact Verification, and 3) LogicNLG for Faithful Table-to-Text Generation. Experimental results demonstrate that ReasTAP achieves new state-of-the-art results on all of them and delivers a significant improvement under low-resource setting. Our code is publicly available at https://github.com/Yale-LILY/ReasTAP.",
}
信息表格
属性 |
详情 |
模型类型 |
表格推理模型 |
训练数据 |
WikiSQL |
标签 |
表格问答 |