reastap-large-finetuned-wikisql开源模型 - 助力表格推理，免费实现数据精准分析

首页

Reastap Large Finetuned Wikisql

由 Yale-LILY 开发

ReasTAP是基于表格推理的预训练模型，通过合成推理示例注入表格推理技能，在WikiSQL数据集上进行了微调。

问答系统

Transformers

英语#表格推理 #序列生成 #低资源优化

下载量 27

发布时间 : 6/3/2023

模型简介

ReasTAP是一个专注于表格推理的序列生成模型，能够理解和处理表格数据，执行复杂的表格推理任务。

模型特点

表格推理技能注入

通过预训练阶段注入7种表格推理技能，如数值运算、时间比较和逻辑连接等。

多任务表现优异

在表格问答、表格事实验证和表格到文本生成等多个下游任务上达到最先进水平。

低资源环境适应性强

在低资源环境下表现显著优于其他模型。

模型能力

表格数据理解

表格问答

表格推理

数值运算

时间比较

逻辑连接

使用案例

数据查询与分析

表格问答

从结构化表格中回答自然语言问题

在WikiSQL-Weak和WikiTQ基准测试中表现优异

数据验证

表格事实验证

验证表格数据中的事实陈述

在TabFact基准测试中达到最先进水平

数据呈现

表格到文本生成

根据表格数据生成自然语言描述

在LogicNLG基准测试中表现优异

🚀 ReasTAP

ReasTAP是一个表格推理模型，由2022年EMNLP会议的论文ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples提出。原GitHub仓库为 https://github.com/Yale-LILY/ReasTAP。该模型旨在解决表格数据推理问题，通过在预训练阶段注入表格推理技能，提升模型在表格相关任务上的性能。

🚀 快速开始

Yale-LILY/reastap-large-finetuned-wikisql 模型基于 Yale-LILY/reastap-large 进行初始化，并在 WikiSQL 数据集上进行了微调。

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import pandas as pd

tokenizer = AutoTokenizer.from_pretrained("Yale-LILY/reastap-large-finetuned-wikisql")
model = AutoModelForSeq2SeqLM.from_pretrained("Yale-LILY/reastap-large-finetuned-wikisql")

data = {
    "year": [1896, 1900, 1904, 2004, 2008, 2012],
    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)

query = "In which year did beijing host the Olympic Games?"
encoding = tokenizer(table=table, query=query, return_tensors="pt")

outputs = model.generate(**encoding)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# [' 2008']

📚 详细文档

参考引用

@inproceedings{zhao-etal-2022-reastap,
    title = "{R}eas{TAP}: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples",
    author = "Zhao, Yilun  and
      Nan, Linyong  and
      Qi, Zhenting  and
      Zhang, Rui  and
      Radev, Dragomir",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.615",
    pages = "9006--9018",
    abstract = "Reasoning over tabular data requires both table structure understanding and a broad set of table reasoning skills. Current models with table-specific architectures and pre-training methods perform well on understanding table structures, but they still struggle with tasks that require various table reasoning skills. In this work, we develop ReasTAP to show that high-level table reasoning skills can be injected into models during pre-training without a complex table-specific architecture design. We define 7 table reasoning skills, such as numerical operation, temporal comparison, and conjunction. Each reasoning skill is associated with one example generator, which synthesizes questions over semi-structured tables according to the sampled templates. We model the table pre-training task as a sequence generation task and pre-train ReasTAP to generate precise answers of the synthetic examples. ReasTAP is evaluated on four benchmarks covering three downstream tasks including 1) WikiSQL-Weak and WikiTQ for Table Question Answering, 2) TabFact for Table Fact Verification, and 3) LogicNLG for Faithful Table-to-Text Generation. Experimental results demonstrate that ReasTAP achieves new state-of-the-art results on all of them and delivers a significant improvement under low-resource setting. Our code is publicly available at https://github.com/Yale-LILY/ReasTAP.",
}