deberta-xlarge Open-Source Natural Language Model - Free Deployment to Facilitate Various Language Understanding Tasks

Deberta Xlarge

Developed by microsoft

DeBERTa improves upon BERT and RoBERTa models with a disentangled attention mechanism and enhanced masked decoder, demonstrating superior performance in most natural language understanding tasks.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Disentangled Attention Mechanism #Enhanced Masked Decoding #Natural Language Understanding

Downloads 312

Release Time : 3/2/2022

Model Overview

DeBERTa is an enhanced BERT model that improves performance in natural language understanding tasks through a disentangled attention mechanism and an enhanced masked decoder.

Model Features

Disentangled Attention Mechanism

Improves the model's text comprehension by separating content and positional attention mechanisms.

Enhanced Masked Decoder

An improved masked decoding strategy that enhances performance in masked language modeling tasks.

Large-scale Pretraining

Pretrained on 80GB of training data, outperforming RoBERTa in various natural language understanding tasks.

Model Capabilities

Text understanding

Masked language modeling

Natural language inference

Question answering

Text classification

Use Cases

Natural Language Understanding

Question Answering

Excels on QA datasets like SQuAD 1.1/2.0.

Achieves F1/EM scores of 95.5/90.1 on SQuAD 1.1

Text Classification

Outperforms in text classification tasks on the GLUE benchmark.

Achieves 97.0% accuracy on SST-2 sentiment classification

Natural Language Inference

Excels in NLI tasks like MNLI.

Achieves 91.5/91.2 accuracy on MNLI-m/mm

🚀 DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa improves upon the BERT and RoBERTa models by leveraging disentangled attention and an enhanced mask decoder. With these two key improvements, DeBERTa outperforms RoBERTa on a majority of NLU tasks using 80GB of training data.

Please visit the official repository for more details and updates.

This is the DeBERTa XLarge model with 48 layers and a hidden size of 1024. It has a total of 750M parameters.

✨ Features

Fine-tuning on NLU tasks

We present the development results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.

Property	Details
Model Type	DeBERTa XLarge with 48 layers, 1024 hidden size, 750M parameters
Training Data	80GB

Model	SQuAD 1.1	SQuAD 2.0	MNLI-m/mm	SST-2	QNLI	CoLA	RTE	MRPC	QQP	STS-B
	F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT-Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa-Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet-Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
DeBERTa-Large¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
DeBERTa-XLarge¹	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
DeBERTa-V2-XLarge¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
DeBERTa-V2-XXLarge^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

Notes.

¹ Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when starting from MNLI fine-tuned models. However, we only report the numbers fine-tuned from pretrained base models for those 4 tasks.
² To try the XXLarge model with HF transformers, you need to specify --sharded_ddp

💻 Usage Examples

Basic Usage

cd transformers/examples/text-classification/
export TASK_NAME=mrpc
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 4   \
--learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

📄 License

This project is licensed under the MIT license.

📚 Documentation

Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご