T5-for-Adversarial-Paraphrasing開源複述生成器 - 生成語義同但表述差異大的句子對

首頁

T5 For Adversarial Paraphrasing

由AMHR開發

該模型是為對抗性複述任務設計的複述生成器，專注於生成語義等價但詞彙句法差異顯著的複述句對。

文本生成

Transformers

#對抗性複述生成 #語義等價增強 #T5模型優化

下載量 26

發布時間 : 3/2/2022

模型概述

基於T5模型的複述生成器，用於生成在推理屬性上等價但詞彙和句法差異顯著的複述句對，旨在提升複述檢測模型的性能。

模型特點

對抗性複述生成

生成語義等價但詞彙和句法差異顯著的複述句對，挑戰現有複述檢測模型。

推理屬性等價

關注句子的推理屬性而非詞彙重疊，確保複述句對的語義等價性。

自動化數據集生成

利用T5模型自動化生成對抗性複述數據集，加速數據集構建流程。

模型能力

文本生成

複述生成

對抗性樣本生成

使用案例

自然語言處理

複述檢測模型測試

生成對抗性複述句對用於測試現有複述檢測模型的性能。

現有模型的準確率僅達隨機水平。

複述檢測模型增強

使用生成的對抗性複述數據集訓練複述檢測模型，提升其性能。

模型在語義等價識別上的準確率顯著提升。

🚀 釋義生成模型

本模型是一個釋義生成器，專為論文 Improving Paraphrase Detection with the Adversarial Paraphrasing Task 中描述和使用的對抗性釋義任務而設計。該模型能夠生成語義等效但詞彙和句法不同的釋義。

🚀 快速開始

若要更好地利用此模型，可參考 GitHub 倉庫中的 nap_generation.py 文件，其中涉及了 top-k 採樣和 top-p 採樣的概念。需要注意的是，Hugging Face 上的演示僅會輸出一個句子，且很可能與輸入句子相同，因為該模型原本是使用束搜索和採樣進行輸出的。

GitHub 倉庫地址：https://github.com/Advancing-Machine-Human-Reasoning-Lab/apt.git

📄 許可證

如果您使用了此模型，請引用以下文獻：

@inproceedings{nighojkar-licato-2021-improving,
    title = "Improving Paraphrase Detection with the Adversarial Paraphrasing Task",
    author = "Nighojkar, Animesh  and
      Licato, John",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.552",
    pages = "7106--7116",
    abstract = "If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.",
}