deberta-v2-xlarge-mnli Open-source Natural Language Understanding Model - Free Deployment to Support Multiple NLU Tasks

Deberta V2 Xlarge Mnli

Developed by microsoft

DeBERTa V2 XLarge is an enhanced natural language understanding model developed by Microsoft, which improves the BERT architecture through a disentangled attention mechanism and enhanced masked decoder, outperforming BERT and RoBERTa on multiple NLU tasks.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Disentangled Attention Mechanism #Natural Language Understanding #High-precision NLU

Downloads 51.59k

Release Time : 3/2/2022

Model Overview

An improved BERT architecture based on the disentangled attention mechanism, focusing on natural language understanding tasks, with excellent performance on benchmarks such as GLUE and SQuAD.

Model Features

Disentangled Attention Mechanism

Separates content and position attention calculations, enhancing the model's ability to understand textual positional relationships.

Enhanced Masked Decoder

Improved masked language modeling objective to better capture contextual dependencies.

Large-scale Pretraining

Pretrained on 80GB of training data to learn richer language representations.

Model Capabilities

Text Classification

Question Answering System

Semantic Similarity Calculation

Natural Language Inference

Sentence Pair Classification

Use Cases

Text Understanding

Sentiment Analysis

Analyze text sentiment (positive/negative).

Achieved 97.5% accuracy on the SST-2 dataset.

Question Answering System

Answer questions based on given text.

Achieved F1 score of 91.4 and EM score of 88.9 on SQuAD 2.0.

Semantic Analysis

Semantic Similarity Judgment

Determine the semantic similarity between two sentences.

Pearson correlation coefficient of 92.9 on the STS-B dataset.

Natural Language Inference

Determine logical relationships between texts (entailment/contradiction/neutral).

Accuracy of 91.7% (matched)/91.6% (mismatched) on the MNLI dataset.

🚀 DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa improves upon the BERT and RoBERTa models by leveraging disentangled attention and an enhanced mask decoder. With 80GB of training data, it outperforms BERT and RoBERTa in the majority of NLU tasks.

For more details and updates, please visit the official repository.

This is the DeBERTa V2 xlarge model fine-tuned on the MNLI task, featuring 24 layers and a hidden size of 1536. The total number of parameters is 900M.

✨ Features

Fine-tuning on NLU tasks

We present the development results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.

Property	Details
Model Type	DeBERTa V2 xlarge fine-tuned on MNLI task
Training Data	80GB

Model	SQuAD 1.1	SQuAD 2.0	MNLI-m/mm	SST-2	QNLI	CoLA	RTE	MRPC	QQP	STS-B
	F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT-Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa-Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet-Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
DeBERTa-Large¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
DeBERTa-XLarge¹	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
DeBERTa-V2-XLarge¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
DeBERTa-V2-XXLarge^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

Notes.

¹ Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when starting from MNLI fine-tuned models. However, we only report the numbers fine-tuned from pretrained base models for those 4 tasks.
² To try the XXLarge model with HF transformers, you need to specify --sharded_ddp

💻 Usage Examples

Basic Usage

cd transformers/examples/text-classification/
export TASK_NAME=mrpc
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 4   \
--learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

📄 License

This project is licensed under the MIT license.

📚 Documentation

Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご