DeBERTaV3 Open-Source Natural Language Understanding Model - Free Deployment Supports Multi-Language Task Processing

Debertav3 Mnli Snli Anli

Developed by NDugar

DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, which improves upon BERT and RoBERTa models and performs better in most natural language understanding tasks.

Large Language Model

Transformers

English#Disentangled Attention Mechanism #Natural Language Understanding #Zero-shot Classification

Downloads 26

Release Time : 3/2/2022

Model Overview

DeBERTa V2 XXLarge is a large-scale natural language understanding model with 1.5 billion parameters, employing a disentangled attention mechanism and an enhanced masked decoder, achieving excellent performance on multiple GLUE benchmark tasks.

Model Features

Disentangled Attention Mechanism

Enhances the model's contextual understanding by separating content and position attention calculations.

Enhanced Masked Decoder

Improved masking mechanism helps the model better handle masked tokens.

Large-scale Pretraining

Pretrained on 160GB of raw data with a parameter scale of 1.5 billion.

Model Capabilities

Text Classification

Natural Language Inference

Question Answering System

Semantic Similarity Calculation

Use Cases

Natural Language Understanding

Textual Entailment Judgment

Determine the logical relationship between two sentences (entailment/contradiction/neutral).

Achieved 91.7/91.9 accuracy on the MNLI task.

Sentiment Analysis

Analyze the sentiment tendency of text.

Achieved 97.2% accuracy on the SST-2 task.

Question Answering System

Open-domain Question Answering

Answer questions based on text content.

Achieved 92.2/89.7 F1/EM scores on SQuAD 2.0.

🚀 DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa improves upon the BERT and RoBERTa models by leveraging disentangled attention and an enhanced mask decoder. Trained with 80GB of data, it outperforms its predecessors in most NLU tasks.

For more details and updates, please visit the official repository. This is the DeBERTa V2 xxlarge model, featuring 48 layers and a 1536 hidden size. With a total of 1.5B parameters, it was trained on 160GB of raw data.

🚀 Quick Start

This section provides an overview of the DeBERTa model and its performance on various NLU tasks. You can find detailed installation and usage instructions below.

✨ Features

Disentangled Attention: DeBERTa uses disentangled attention to improve the understanding of language structure.
Enhanced Mask Decoder: The enhanced mask decoder helps in better prediction of masked tokens.
High Performance: Outperforms BERT and RoBERTa on most NLU tasks.

📦 Installation

To run the DeBERTa model, you need to install the necessary dependencies. You can use Deepspeed or --sharded_ddp for training.

Install with Deepspeed

pip install datasets
pip install deepspeed
# Download the deepspeed config file
wget https://huggingface.co/microsoft/deberta-v2-xxlarge/resolve/main/ds_config.json -O ds_config.json

Install with --sharded_ddp

cd transformers/examples/text-classification/

💻 Usage Examples

Run with Deepspeed

export TASK_NAME=mnli
output_dir="ds_results"
num_gpus=8
batch_size=8
python -m torch.distributed.launch --nproc_per_node=${num_gpus} \
  run_glue.py \
  --model_name_or_path microsoft/deberta-v2-xxlarge \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 256 \
  --per_device_train_batch_size ${batch_size} \
  --learning_rate 3e-6 \
  --num_train_epochs 3 \
  --output_dir $output_dir \
  --overwrite_output_dir \
  --logging_steps 10 \
  --logging_dir $output_dir \
  --deepspeed ds_config.json

Run with --sharded_ddp

export TASK_NAME=mnli
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 256   --per_device_train_batch_size 8   \
--learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

📚 Documentation

Fine-tuning on NLU tasks

The following table shows the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.

Model	SQuAD 1.1	SQuAD 2.0	MNLI-m/mm	SST-2	QNLI	CoLA	RTE	MRPC	QQP	STS-B
	F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT-Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa-Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet-Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
DeBERTa-Large¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
DeBERTa-XLarge¹	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
DeBERTa-V2-XLarge¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
DeBERTa-V2-XXLarge^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

Notes

¹ Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when starting from MNLI fine-tuned models. However, we only report the numbers fine-tuned from pretrained base models for those 4 tasks.
² To try the XXLarge model with HF transformers, we recommend using deepspeed as it's faster and saves memory.

📄 License

No license information provided in the original document.

🔧 Technical Details

Model Type: Decoding-enhanced BERT with Disentangled Attention
Training Data: 80GB for the base model, 160GB for the DeBERTa V2 xxlarge model

Property	Details
Model Type	Decoding-enhanced BERT with Disentangled Attention
Training Data	80GB for the base model, 160GB for the DeBERTa V2 xxlarge model

📖 Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

💡 Usage Tip

To try the XXLarge model with HF transformers, we recommend using deepspeed as it's faster and saves memory.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご