deberta-large-mnli Open-source Natural Language Processing Model - Free Deployment to Boost Multi-task Understanding

Deberta Large Mnli

Developed by microsoft

DeBERTa-V2-XXLarge is an improved BERT model based on the disentangled attention mechanism and enhanced masked decoder, excelling in multiple natural language understanding tasks.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Disentangled Attention Mechanism #Natural Language Understanding #Large-scale Pretraining

Downloads 1.4M

Release Time : 3/2/2022

Model Overview

DeBERTa improves upon BERT and RoBERTa models through its disentangled attention mechanism and enhanced masked decoder. Trained on 80GB of data, it surpasses the performance of BERT and RoBERTa in most natural language understanding tasks.

Model Features

Disentangled Attention Mechanism

Improves traditional self-attention mechanisms through disentangled attention, enhancing model performance.

Enhanced Masked Decoder

Utilizes an enhanced masked decoder to further improve performance in natural language understanding tasks.

Large-scale Training Data

Trained on 80GB of data, covering a wide range of natural language understanding scenarios.

Model Capabilities

Text Classification

Question Answering Systems

Natural Language Inference

Semantic Similarity Calculation

Use Cases

Natural Language Understanding

Text Classification

Can be used for sentiment analysis, topic classification, and other text classification tasks.

Achieves 97.2% accuracy on the SST-2 dataset.

Question Answering Systems

Can be used to build question answering systems to respond to user queries.

Achieves 96.1/91.4 F1/EM scores on the SQuAD 1.1 dataset.

Natural Language Inference

Can be used to determine the logical relationship between two sentences.

Achieves 91.7/91.9 accuracy on the MNLI dataset.

🚀 DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa improves upon the BERT and RoBERTa models by leveraging disentangled attention and an enhanced mask decoder. With 80GB of training data, it outperforms BERT and RoBERTa in most NLU tasks.

This is the DeBERTa large model fine - tuned for the MNLI task. For more details and updates, please visit the official repository.

✨ Features

Advanced Architecture: Utilizes disentangled attention and an enhanced mask decoder to enhance performance.
Superior Performance: Outperforms BERT, RoBERTa, and XLNet on a majority of NLU tasks.

📚 Documentation

Fine - tuning on NLU tasks

We present the development results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.

Model	SQuAD 1.1	SQuAD 2.0	MNLI - m/mm	SST - 2	QNLI	CoLA	RTE	MRPC	QQP	STS - B
	F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT - Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa - Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet - Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
[DeBERTa - Large](https://huggingface.co/microsoft/deberta - large)¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
[DeBERTa - XLarge](https://huggingface.co/microsoft/deberta - xlarge)¹	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
[DeBERTa - V2 - XLarge](https://huggingface.co/microsoft/deberta - v2 - xlarge)¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
[DeBERTa - V2 - XXLarge](https://huggingface.co/microsoft/deberta - v2 - xxlarge)^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

Notes

¹ Following RoBERTa, for RTE, MRPC, STS - B, we fine - tune the tasks based on [DeBERTa - Large - MNLI](https://huggingface.co/microsoft/deberta - large - mnli), [DeBERTa - XLarge - MNLI](https://huggingface.co/microsoft/deberta - xlarge - mnli), [DeBERTa - V2 - XLarge - MNLI](https://huggingface.co/microsoft/deberta - v2 - xlarge - mnli), [DeBERTa - V2 - XXLarge - MNLI](https://huggingface.co/microsoft/deberta - v2 - xxlarge - mnli). The results of SST - 2/QQP/QNLI/SQuADv2 will also be slightly improved when starting from MNLI fine - tuned models. However, we only report the numbers fine - tuned from pretrained base models for those 4 tasks.
² To try the XXLarge model with HF transformers, you need to specify --sharded_ddp

cd transformers/examples/text - classification/
export TASK_NAME=mrpc
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta - v2 - xxlarge   \
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 4   \
--learning_rate 3e - 6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING - ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご