T5-for-Adversarial-Paraphrasingオープンソース复述生成器 - 意味が同じであるが表現に大きな差のある文のペアを生成する

ホーム

T5 For Adversarial Paraphrasing

AMHRによって開発

このモデルは対抗性复述タスク用に設計された复述生成器で、意味が等価であるが語彙と構文に大きな差異がある复述文ペアの生成に特化しています。

テキスト生成

Transformers

#対抗性复述生成 #意味等価強化 #T5モデル最適化

ダウンロード数 26

リリース時間 : 3/2/2022

モデル概要

T5モデルに基づく复述生成器で、推論属性が等価であるが語彙と構文に大きな差異がある复述文ペアを生成し、复述検出モデルの性能向上を目指しています。

モデル特徴

対抗性复述生成

意味が等価であるが語彙と構文に大きな差異がある复述文ペアを生成し、既存の复述検出モデルに挑戦します。

推論属性等価

語彙の重複ではなく文の推論属性に注目し、复述文ペアの意味の等価性を確保します。

自動化データセット生成

T5モデルを利用して対抗性复述データセットを自動生成し、データセット構築のプロセスを加速します。

モデル能力

テキスト生成

复述生成

対抗性サンプル生成

使用事例

自然言語処理

复述検出モデルテスト

既存の复述検出モデルの性能をテストするために対抗性复述文ペアを生成します。

既存モデルの正解率はランダムレベルに留まります。

复述検出モデル強化

生成された対抗性复述データセットを使用して复述検出モデルを訓練し、その性能を向上させます。

モデルの意味等価識別の正解率が大幅に向上します。

🚀 パラフレーズモデル

このモデルは、論文「https://aclanthology.org/2021.acl-long.552/」で説明され使用されている敵対的パラフレーズタスク向けに設計されたパラフレーザーです。

GitHubリポジトリのnap_generation.pyを参照することで、top-kサンプリングとtop-pサンプリングの概念を用いてこのモデルをより良く活用する方法を学ぶことができます。Hugging Faceのデモでは、モデルがビームサーチとサンプリングを使用して出力するため、入力文とほぼ同じ1文のみが出力されます。

📦 リンク情報

GitHubリポジトリ: https://github.com/Advancing-Machine-Human-Reasoning-Lab/apt.git

📄 引用情報

このモデルを使用する場合は、以下を引用してください。

@inproceedings{nighojkar-licato-2021-improving,
    title = "Improving Paraphrase Detection with the Adversarial Paraphrasing Task",
    author = "Nighojkar, Animesh  and
      Licato, John",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.552",
    pages = "7106--7116",
    abstract = "If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.",
}