deberta-large-mnli-zero-cls Open-source Natural Language Understanding Model

Deberta Large Mnli Zero Cls

Developed by Narsil

DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, surpassing BERT and RoBERTa in multiple natural language understanding tasks by improving the attention mechanism and masked decoder.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Disentangled Attention Mechanism #Natural Language Understanding #Enhanced Masked Decoder

Downloads 51.27k

Release Time : 3/2/2022

Model Overview

DeBERTa improves upon BERT and RoBERTa models through its disentangled attention mechanism and enhanced masked decoder, supporting various natural language understanding tasks.

Model Features

Disentangled Attention Mechanism

Improves traditional attention computation through a disentangled attention mechanism, enhancing model performance.

Enhanced Masked Decoder

Utilizes an enhanced masked decoder to further improve the model's performance in natural language understanding tasks.

High Performance

Outperforms models like BERT, RoBERTa, and XLNet in multiple natural language understanding tasks.

Model Capabilities

Text Classification

Question Answering Systems

Natural Language Inference

Semantic Similarity Calculation

Use Cases

Natural Language Processing

Text Classification

Used for tasks such as sentiment analysis and topic classification.

Achieves an accuracy of 97.2% on the SST-2 dataset.

Question Answering Systems

Used to build high-performance question answering systems.

Scores an F1 of 96.1 and EM of 91.4 on the SQuAD 1.1 dataset.

Natural Language Inference

Used to determine the logical relationship between two sentences.

Achieves an accuracy of 91.7/91.9 (matched/mismatched) on the MNLI dataset.

🚀 DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on the majority of NLU tasks with 80GB training data.

Please check the official repository for more details and updates. This is the DeBERTa large model fine - tuned with the MNLI task.

🚀 Quick Start

This is a quick overview of the DeBERTa model and its fine - tuning results. For more in - depth information, refer to the official repository and the citation paper.

✨ Features

Enhanced Architecture: DeBERTa improves upon BERT and RoBERTa using disentangled attention and an enhanced mask decoder.
High Performance: It outperforms BERT and RoBERTa on the majority of NLU tasks with 80GB training data.

📚 Documentation

Fine - tuning on NLU tasks

We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.

Property	Details
Model Type	DeBERTa (Decoding - enhanced BERT with Disentangled Attention)
Training Data	80GB

Model	SQuAD 1.1	SQuAD 2.0	MNLI - m/mm	SST - 2	QNLI	CoLA	RTE	MRPC	QQP	STS - B
	F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT - Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa - Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet - Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
[DeBERTa - Large](https://huggingface.co/microsoft/deberta - large)¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
[DeBERTa - XLarge](https://huggingface.co/microsoft/deberta - xlarge)¹	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
[DeBERTa - V2 - XLarge](https://huggingface.co/microsoft/deberta - v2 - xlarge)¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
[DeBERTa - V2 - XXLarge](https://huggingface.co/microsoft/deberta - v2 - xxlarge)^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

Notes.

¹ Following RoBERTa, for RTE, MRPC, STS - B, we fine - tune the tasks based on [DeBERTa - Large - MNLI](https://huggingface.co/microsoft/deberta - large - mnli), [DeBERTa - XLarge - MNLI](https://huggingface.co/microsoft/deberta - xlarge - mnli), [DeBERTa - V2 - XLarge - MNLI](https://huggingface.co/microsoft/deberta - v2 - xlarge - mnli), [DeBERTa - V2 - XXLarge - MNLI](https://huggingface.co/microsoft/deberta - v2 - xxlarge - mnli). The results of SST - 2/QQP/QNLI/SQuADv2 will also be slightly improved when starting from MNLI fine - tuned models. However, we only report the numbers fine - tuned from pretrained base models for those 4 tasks.
² To try the XXLarge model with HF transformers, you need to specify --sharded_ddp

💻 Usage Examples

Basic Usage

cd transformers/examples/text - classification/
export TASK_NAME=mrpc
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta - v2 - xxlarge   \
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 4   \
--learning_rate 3e - 6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

📄 License

This project is licensed under the MIT license.

Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING - ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご