Deberta-xlarge-mnli Open-Source Natural Language Understanding Model - Fine-Tuned for Free to Assist with NLP Tasks

Deberta Xlarge Mnli

Developed by microsoft

DeBERTa-XLarge-MNLI is an enhanced BERT model based on the disentangled attention mechanism, fine-tuned on the MNLI task with 750M parameters, excelling in natural language understanding tasks.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Disentangled Attention Mechanism #Natural Language Inference #Large-scale Pretraining

Downloads 833.58k

Release Time : 3/2/2022

Model Overview

DeBERTa improves upon BERT and RoBERTa models through its disentangled attention mechanism and enhanced masked decoder. Trained on 80GB of data, it surpasses BERT and RoBERTa in most natural language understanding tasks.

Model Features

Disentangled Attention Mechanism

Improves BERT and RoBERTa models through a disentangled attention mechanism, enhancing performance in natural language understanding tasks.

Enhanced Masked Decoder

Utilizes an enhanced masked decoder to further boost the model's performance.

Large-scale Training Data

Trained on 80GB of data, the model demonstrates excellent performance across multiple natural language understanding tasks.

Model Capabilities

Natural Language Understanding

Text Classification

Semantic Similarity Calculation

Use Cases

Natural Language Processing

Textual Entailment Recognition

Identifies the logical relationship between two sentences (entailment, contradiction, or neutral).

Achieves 91.5/91.2 accuracy (matched/mismatched) on the MNLI task.

Semantic Similarity Calculation

Computes the semantic similarity between two sentences.

Achieves Pearson/Spearman correlation coefficients of 92.9/92.7 on the STS-B task.

🚀 DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa is a model that improves upon BERT and RoBERTa using disentangled attention and an enhanced mask decoder. It outperforms BERT and RoBERTa on the majority of NLU tasks with 80GB of training data.

This is the DeBERTa xlarge model (750M) fine-tuned for the MNLI task.

✨ Features

Enhanced Architecture: Utilizes disentangled attention and an enhanced mask decoder to improve upon BERT and RoBERTa.
High Performance: Outperforms BERT and RoBERTa on most NLU tasks with 80GB of training data.
Fine-tuned for MNLI: This specific model is fine-tuned for the MNLI task.

📚 Documentation

Fine-tuning on NLU tasks

We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.

Property	Details
Model Type	DeBERTa xlarge (750M) fine-tuned with mnli task
Training Data	80GB

Model	SQuAD 1.1	SQuAD 2.0	MNLI-m/mm	SST-2	QNLI	CoLA	RTE	MRPC	QQP	STS-B
	F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT-Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa-Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet-Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
DeBERTa-Large¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
DeBERTa-XLarge¹	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
DeBERTa-V2-XLarge¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
DeBERTa-V2-XXLarge^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

Notes.

¹ Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when starting from MNLI fine-tuned models. However, we only report the numbers fine-tuned from pretrained base models for those 4 tasks.
² To try the XXLarge model with HF transformers, you need to specify --sharded_ddp

cd transformers/examples/text-classification/
export TASK_NAME=mrpc
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 4   \
--learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご