roberta-large-wanli开源模型 - 助力自然语言推理，表现优于roberta-large-mnli

首页

Roberta Large Wanli

由 alisawuffles 开发

基于WANLI数据集微调的roberta-large模型，用于自然语言推理任务，在多个域外测试集上表现优于roberta-large-mnli。

文本分类

Transformers

英语#自然语言推理 #域外泛化 #人机协作数据集

下载量 1,195

发布时间 : 3/30/2022

模型简介

该模型是基于WANLI（Worker-AI协作自然语言推理数据集）微调的roberta-large模型，专门用于自然语言推理（NLI）任务。它在八个域外测试集上表现优异，尤其在HANS和对抗性NLI数据集上分别提升了11%和9%。

模型特点

域外性能优越

在八个域外测试集上表现优于roberta-large-mnli，特别是在HANS上提升11%，在对抗性NLI上提升9%。

基于人机协作数据集

使用WANLI数据集训练，该数据集结合了语言模型的生成能力和人类的评估能力，具有更丰富的语言多样性。

复杂推理模式

能够处理展示复杂推理模式的自然语言推理任务。

模型能力

自然语言推理

文本分类

句子对关系判断

使用案例

自然语言处理

矛盾检测

判断两个句子之间是否存在矛盾关系。

能够准确识别句子间的矛盾关系。

蕴含关系判断

判断一个句子是否蕴含另一个句子的意思。

能够准确判断句子间的蕴含关系。

中立关系判断

判断两个句子是否表达中立关系。

能够准确识别句子间的中立关系。

🚀 预训练模型：roberta-large-wanli

这是一个基于Worker-AI协作自然语言推理数据集WANLI微调的roberta-large现成模型（Liu等人，2022）。在八个域外测试集上，它的表现优于roberta-large-mnli模型，在HANS上提升了11%，在对抗性自然语言推理数据集上提升了9%。

🚀 快速开始

模型信息

属性	详情
模型类型	文本分类
训练数据	alisawuffles/WANLI

示例输入

"I almost forgot to eat lunch.I didn't forget to eat lunch."
"I almost forgot to eat lunch.I forgot to eat lunch."
"I ate lunch.I almost forgot to eat lunch."

💻 使用示例

基础用法

from transformers import RobertaTokenizer, RobertaForSequenceClassification

model = RobertaForSequenceClassification.from_pretrained('alisawuffles/roberta-large-wanli')
tokenizer = RobertaTokenizer.from_pretrained('alisawuffles/roberta-large-wanli')

x = tokenizer("I almost forgot to eat lunch.", "I didn't forget to eat lunch.", return_tensors='pt', max_length=128, truncation=True)
logits = model(**x).logits
probs = logits.softmax(dim=1).squeeze(0)
label_id = torch.argmax(probs).item()
prediction = model.config.id2label[label_id]

📚 详细文档

引用信息

@inproceedings{liu-etal-2022-wanli,
    title = "{WANLI}: Worker and {AI} Collaboration for Natural Language Inference Dataset Creation",
    author = "Liu, Alisa  and
      Swayamdipta, Swabha  and
      Smith, Noah A.  and
      Choi, Yejin",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.508",
    pages = "6826--6847",
    abstract = "A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We introduce a novel approach for dataset creation based on worker and AI collaboration, which brings together the generative strength of language models and the evaluative strength of humans. Starting with an existing dataset, MultiNLI for natural language inference (NLI), our approach uses dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instructs GPT-3 to compose new examples with similar patterns. Machine generated examples are then automatically filtered, and finally revised and labeled by human crowdworkers. The resulting dataset, WANLI, consists of 107,885 NLI examples and presents unique empirical strengths over existing NLI datasets. Remarkably, training a model on WANLI improves performance on eight out-of-domain test sets we consider, including by 11{\%} on HANS and 9{\%} on Adversarial NLI, compared to training on the 4x larger MultiNLI. Moreover, it continues to be more effective than MultiNLI augmented with other NLI datasets. Our results demonstrate the promise of leveraging natural language generation techniques and re-imagining the role of humans in the dataset creation process.",
}