D

Deberta V3 Xsmall

Developed by microsoft
DeBERTaV3 is an improved version of the DeBERTa model proposed by Microsoft, which enhances efficiency through ELECTRA-style gradient-disentangled embedding sharing pretraining method, demonstrating excellent performance in natural language understanding tasks.
Downloads 87.40k
Release Time : 3/2/2022

Model Overview

DeBERTaV3 employs a disentangled attention mechanism and enhanced masked decoder, combined with ELECTRA-style pretraining methods, significantly improving the model's performance in downstream tasks.

Model Features

Gradient-Disentangled Embedding Sharing
Adopts ELECTRA-style pretraining method, optimizing embedding sharing mechanism through gradient disentanglement technology
Disentangled Attention Mechanism
Improved attention mechanism capable of separately processing content and positional information, enhancing model comprehension
Efficient Parameter Design
The xsmall version has only 22 million backbone parameters, significantly reducing model size while maintaining performance

Model Capabilities

Text Classification
Question Answering System
Natural Language Inference

Use Cases

Natural Language Processing
Question Answering System
Used for building high-performance question answering systems
Achieves F1 score of 84.8 and EM score of 82.0 on SQuAD 2.0
Text Classification
Used for natural language inference tasks
Achieves accuracy of 88.1/88.3 (m/mm) on MNLI task
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase