DeBERTa-v3-xsmall-mnli開源模型 - 免費部署助力零樣本分類任務！

首頁

Deberta V3 Xsmall Mnli Fever Anli Ling Binary

由MoritzLaurer開發

基於DeBERTa-v3-xsmall的二元自然語言推理模型，針對零樣本分類任務優化

文本分類

Transformers

英語開源協議:MIT #零樣本分類 #二元NLI #多數據集訓練

下載量 10.77k

發布時間 : 3/2/2022

模型概述

該模型在四個NLI數據集上訓練，專門用於預測'蘊含'或'不蘊含'的二元分類任務，特別適合零樣本分類場景。

模型特點

二元分類優化

專門針對'蘊含'與'不蘊含'的二元分類場景設計，簡化了傳統三分類NLI任務

多數據集訓練

基於MultiNLI、Fever-NLI、LingNLI和ANLI四個數據集訓練，共782,357個假設-前提對

高效推理

xsmall版本在保持良好性能的同時提供更快的推理速度

模型能力

零樣本文本分類

自然語言推理

二元文本分類

使用案例

文本分析

情感分析

判斷文本是否蘊含特定情感傾向

在測試集上達到0.925準確率(mnli-m-2c)

事實核查

驗證陳述是否蘊含於證據文本中

在fever-nli-2c上達到0.892準確率

🚀 DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary

本模型主要用於文本分類和零樣本分類任務，通過在多個自然語言推理（NLI）數據集上訓練，能有效預測“蘊含”或“非蘊含”關係，為相關自然語言處理任務提供支持。

🚀 快速開始

以下是使用該模型進行預測的示例代碼：

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model_name = "MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "not_entailment"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

✨ 主要特性

該模型在4個自然語言推理（NLI）數據集的782357個假設 - 前提對上進行訓練。
專門針對零樣本分類進行設計，在二分類NLI任務中預測“蘊含”或“非蘊含”。
基於微軟的DeBERTa - v3 - xsmall模型，其v3變體通過不同的預訓練目標顯著優於先前版本。

📦 安裝指南

文檔未提供具體安裝步驟，可參考Hugging Face Transformers庫的安裝方法，使用以下命令安裝：

pip install transformers

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model_name = "MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "not_entailment"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

高級用法

文檔未提供高級用法示例。

📚 詳細文檔

模型描述

該模型在來自4個NLI數據集的782357個假設 - 前提對上進行訓練，這些數據集包括：MultiNLI、Fever - NLI、LingNLI和ANLI。

需要注意的是，該模型在二分類NLI任務上進行訓練，用於預測“蘊含”或“非蘊含”，這是專門為零樣本分類設計的，因為在零樣本分類中“中立”和“矛盾”之間的區別無關緊要。

基礎模型是微軟的DeBERTa - v3 - xsmall。DeBERTa的v3變體通過包含不同的預訓練目標，顯著優於該模型的先前版本，詳情見DeBERTa - V3論文。

若追求更高性能（但速度較慢），建議使用https://huggingface.co/MoritzLaurer/DeBERTa - v3 - large - mnli - fever - anli - ling - wanli。

預期用途和限制

模型使用方法

見上文使用示例部分。

訓練數據

該模型在來自4個NLI數據集的782357個假設 - 前提對上進行訓練，這些數據集包括：MultiNLI、Fever - NLI、LingNLI和ANLI。

訓練過程

DeBERTa - v3 - xsmall - mnli - fever - anli - ling - binary使用Hugging Face訓練器進行訓練，超參數如下：

training_args = TrainingArguments(
    num_train_epochs=5,              # total number of training epochs
    learning_rate=2e-05,
    per_device_train_batch_size=32,   # batch size per device during training
    per_device_eval_batch_size=32,    # batch size for evaluation
    warmup_ratio=0.1,                # number of warmup steps for learning rate scheduler
    weight_decay=0.06,               # strength of weight decay
    fp16=True                        # mixed precision training
)

評估結果

該模型使用MultiNLI、ANLI、LingNLI的二分類測試集和Fever - NLI的二分類驗證集（兩個類別而非三個）進行評估，使用的指標是準確率。

數據集	mnli - m - 2c	mnli - mm - 2c	fever - nli - 2c	anli - all - 2c	anli - r3 - 2c	lingnli - 2c
準確率	0.925	0.922	0.892	0.676	0.665	0.888
速度（文本/秒，CPU，128批次）	6.0	6.3	3.0	5.8	5.0	7.6
速度（文本/秒，GPU Tesla P100，128批次）	473	487	230	390	340	586

🔧 技術細節

模型基於DeBERTa - v3 - xsmall架構，通過不同的預訓練目標提升性能。
在二分類NLI任務上訓練，專注於預測“蘊含”或“非蘊含”。
使用Hugging Face訓練器進行訓練，設置了特定的超參數，如訓練輪數、學習率、批次大小等。

📄 許可證

本模型使用MIT許可證。

侷限性和偏差

請參考原始DeBERTa論文和不同NLI數據集的相關文獻，以瞭解潛在的偏差。

引用

如果使用此模型，請引用：Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. ‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. Preprint, June. Open Science Framework. https://osf.io/74b8k.