T5-for-Adversarial-Paraphrasing开源复述生成器 - 生成语义同但表述差异大的句子对

首页

T5 For Adversarial Paraphrasing

由 AMHR 开发

该模型是为对抗性复述任务设计的复述生成器，专注于生成语义等价但词汇句法差异显著的复述句对。

文本生成

Transformers

#对抗性复述生成 #语义等价增强 #T5模型优化

下载量 26

发布时间 : 3/2/2022

模型简介

基于T5模型的复述生成器，用于生成在推理属性上等价但词汇和句法差异显著的复述句对，旨在提升复述检测模型的性能。

模型特点

对抗性复述生成

生成语义等价但词汇和句法差异显著的复述句对，挑战现有复述检测模型。

推理属性等价

关注句子的推理属性而非词汇重叠，确保复述句对的语义等价性。

自动化数据集生成

利用T5模型自动化生成对抗性复述数据集，加速数据集构建流程。

模型能力

文本生成

复述生成

对抗性样本生成

使用案例

自然语言处理

复述检测模型测试

生成对抗性复述句对用于测试现有复述检测模型的性能。

现有模型的准确率仅达随机水平。

复述检测模型增强

使用生成的对抗性复述数据集训练复述检测模型，提升其性能。

模型在语义等价识别上的准确率显著提升。

🚀 释义生成模型

本模型是一个释义生成器，专为论文 Improving Paraphrase Detection with the Adversarial Paraphrasing Task 中描述和使用的对抗性释义任务而设计。该模型能够生成语义等效但词汇和句法不同的释义。

🚀 快速开始

若要更好地利用此模型，可参考 GitHub 仓库中的 nap_generation.py 文件，其中涉及了 top-k 采样和 top-p 采样的概念。需要注意的是，Hugging Face 上的演示仅会输出一个句子，且很可能与输入句子相同，因为该模型原本是使用束搜索和采样进行输出的。

GitHub 仓库地址：https://github.com/Advancing-Machine-Human-Reasoning-Lab/apt.git

📄 许可证

如果您使用了此模型，请引用以下文献：

@inproceedings{nighojkar-licato-2021-improving,
    title = "Improving Paraphrase Detection with the Adversarial Paraphrasing Task",
    author = "Nighojkar, Animesh  and
      Licato, John",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.552",
    pages = "7106--7116",
    abstract = "If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.",
}