deberta-v2-xxlarge-mnli Open-Source Model - Empowering Efficient Completion of Natural Language Understanding Tasks

Deberta V2 Xxlarge Mnli

Developed by microsoft

DeBERTa V2 XXLarge is an enhanced BERT variant based on the disentangled attention mechanism, surpassing RoBERTa and XLNet in natural language understanding tasks with 1.5 billion parameters

Large Language Model

Transformers

EnglishOpen Source License:MIT #Disentangled Attention Mechanism #Natural Language Understanding #1.5 Billion Parameters

Downloads 4,077

Release Time : 3/2/2022

Model Overview

A pre-trained language model improved through disentangled attention mechanism and enhanced masked decoder, specifically fine-tuned for MNLI tasks, suitable for various natural language understanding tasks

Model Features

Disentangled Attention Mechanism

Separates content and positional attention calculations, enhancing the model's sensitivity to positional information

Enhanced Masked Decoder

Improved masked language modeling objective to better capture absolute positional information of masked tokens

Large-scale Pre-training

Trained on 80GB of data, achieving SOTA performance on multiple NLU tasks

Model Capabilities

Natural Language Inference

Text Classification

Question Answering System

Semantic Similarity Calculation

Use Cases

Text Understanding

Sentiment Analysis

Analyze text sentiment orientation

Achieved 97.2% accuracy on SST-2 dataset

Question Answering System

Open-domain question answering tasks

F1 score 92.2/EM 89.7 on SQuAD 2.0

Semantic Analysis

Textual Entailment Recognition

Determine logical relationships between texts

93.5% accuracy on RTE task

Semantic Similarity Calculation

Calculate semantic similarity between sentences

Pearson correlation coefficient 93.2 on STS-B

🚀 DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on the majority of NLU tasks with 80GB training data.

Please check the official repository for more details and updates. This is the DeBERTa V2 XXLarge model fine - tuned with the MNLI task, having 48 layers and a 1536 hidden size. The total number of parameters is 1.5B.

🚀 Quick Start

This section provides an overview of the model and its performance, along with citation information. For more detailed usage, continue reading the following sections.

✨ Features

Improved Architecture: DeBERTa enhances BERT and RoBERTa using disentangled attention and an enhanced mask decoder.
High Performance: Outperforms BERT and RoBERTa on most NLU tasks with 80GB of training data.
Fine - Tuned Model: The DeBERTa V2 XXLarge model is fine - tuned for the MNLI task.

📚 Documentation

Fine - tuning on NLU tasks

We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.

Property	Details
Model Type	DeBERTa V2 XXLarge fine - tuned for MNLI
Training Layers	48
Hidden Size	1536
Total Parameters	1.5B

Model	SQuAD 1.1	SQuAD 2.0	MNLI - m/mm	SST - 2	QNLI	CoLA	RTE	MRPC	QQP	STS - B
	F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT - Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa - Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet - Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
[DeBERTa - Large](https://huggingface.co/microsoft/deberta - large)¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
[DeBERTa - XLarge](https://huggingface.co/microsoft/deberta - xlarge)¹	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
[DeBERTa - V2 - XLarge](https://huggingface.co/microsoft/deberta - v2 - xlarge)¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
[DeBERTa - V2 - XXLarge](https://huggingface.co/microsoft/deberta - v2 - xxlarge)^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

Notes.

¹ Following RoBERTa, for RTE, MRPC, STS - B, we fine - tune the tasks based on [DeBERTa - Large - MNLI](https://huggingface.co/microsoft/deberta - large - mnli), [DeBERTa - XLarge - MNLI](https://huggingface.co/microsoft/deberta - xlarge - mnli), [DeBERTa - V2 - XLarge - MNLI](https://huggingface.co/microsoft/deberta - v2 - xlarge - mnli), [DeBERTa - V2 - XXLarge - MNLI](https://huggingface.co/microsoft/deberta - v2 - xxlarge - mnli). The results of SST - 2/QQP/QNLI/SQuADv2 will also be slightly improved when starting from MNLI fine - tuned models. However, we only report the numbers fine - tuned from pretrained base models for those 4 tasks.
² To try the XXLarge model with HF transformers, we recommend using deepspeed as it's faster and saves memory.

💻 Usage Examples

Basic Usage

Run with Deepspeed:

pip install datasets
pip install deepspeed

# Download the deepspeed config file
wget https://huggingface.co/microsoft/deberta - v2 - xxlarge - mnli/resolve/main/ds_config.json -O ds_config.json

export TASK_NAME=rte
output_dir="ds_results"
num_gpus=8
batch_size=4
python -m torch.distributed.launch --nproc_per_node=${num_gpus} \
  run_glue.py \
  --model_name_or_path microsoft/deberta - v2 - xxlarge - mnli \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 256 \
  --per_device_train_batch_size ${batch_size} \
  --learning_rate 3e-6 \
  --num_train_epochs 3 \
  --output_dir $output_dir \
  --overwrite_output_dir \
  --logging_steps 10 \
  --logging_dir $output_dir \
  --deepspeed ds_config.json

Advanced Usage

You can also run with --sharded_ddp

cd transformers/examples/text - classification/
export TASK_NAME=rte
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta - v2 - xxlarge - mnli   \
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 256   --per_device_train_batch_size 4   \
--learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

📄 License

This project is licensed under the MIT license.

🔧 Technical Details

Model Architecture: DeBERTa uses disentangled attention and an enhanced mask decoder to improve upon BERT and RoBERTa.
Fine - Tuning: Fine - tuned on the MNLI task, with specific configurations for different GLUE benchmark tasks.
Performance: Demonstrates high performance on SQuAD 1.1/2.0 and multiple GLUE tasks compared to other models like BERT - Large, RoBERTa - Large, etc.

📚 Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING - ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご