Erlangshen-DeBERTa-v2-710M-Chinese Open-source Model - Empowering Chinese Natural Language Understanding Tasks

Erlangshen DeBERTa V2 710M Chinese

Developed by IDEA-CCNL

This is a 710M parameter DeBERTa-v2 model focused on Chinese natural language understanding tasks. It is pre-trained using the whole-word masking method, providing strong support for the Chinese NLP field.

Large Language Model

Transformers

ChineseOpen Source License:Apache-2.0 #Chinese whole-word masking #710M large parameters #Natural language understanding

Downloads 246

Release Time : 8/16/2022

Model Overview

The Erlangshen-DeBERTa-v2-710M-Chinese model is a Chinese pre-trained model based on the DeBERTa-v2 architecture. It is good at handling natural language understanding tasks and uses the whole-word masking method to improve the pre-training effect.

Model Features

Whole-word masking pre-training

Adopt the whole-word masking (wwm) method to improve the pre-training effect

Powerful language understanding ability

Based on the DeBERTa-v2-XLarge architecture with 710 million parameters, it has strong language understanding ability

Chinese optimization

Specifically optimized for Chinese NLP tasks and performs excellently in multiple Chinese NLU tasks

Model Capabilities

Text understanding

Semantic analysis

Text completion

Use Cases

Natural language understanding

Sentiment analysis

Analyze the emotional tendency in the text

Text classification

Classify the text content

Language model tasks

Masked language modeling

Predict the masked words

Performs better than RoBERTa-base/large in multiple Chinese NLU tasks

🚀 Erlangshen-DeBERTa-v2-710M-Chinese

A Chinese DeBERTa-v2-XLarge model with 710M parameters, adept at handling NLU tasks using Whole Word Masking.

Main Page: Fengshenbang
Github: Fengshenbang-LM

🚀 Quick Start

This is a Chinese DeBERTa-v2-XLarge model with 710 million parameters, which is good at solving NLU tasks and adopts Whole Word Masking.

✨ Features

Model Taxonomy

Property	Details
Demand	General
Task	Natural Language Understanding (NLU)
Series	Erlangshen
Model	DeBERTa-v2
Parameter	710M
Extra	Chinese

Model Information

Reference Paper: DeBERTa: Decoding-enhanced BERT with Disentangled Attention

To obtain a Chinese DeBERTa-v2-xlarge (710M), we pre-trained the model using the WuDao Corpora (180 GB version). We employed Whole Word Masking (wwm) in Masked Language Modeling (MLM). Specifically, the pre-training phase utilized the Fengshen Framework and took approximately 21 days with 24 A100 (40G) GPUs.

Performance on Downstream Tasks

We present the results on the following downstream tasks:

Model	AFQMC	TNEWS1.1	IFLYTEK	OCNLI	CMNLI
RoBERTa-base	0.7406	0.575	0.6036	0.743	0.7973
RoBERTa-large	0.7488	0.5879	0.6152	0.777	0.814
IDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese	0.7405	0.571	0.5977	0.7568	0.807
IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese	0.7498	0.5817	0.6042	0.8022	0.8301
IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese	0.7549	0.5873	0.6177	0.8012	0.8389

💻 Usage Examples

Basic Usage

from transformers import AutoModelForMaskedLM, AutoTokenizer, FillMaskPipeline
import torch

tokenizer=AutoTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese', use_fast=False)
model=AutoModelForMaskedLM.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese')
text = '生活的真谛是[MASK]。'
fillmask_pipe = FillMaskPipeline(model, tokenizer, device=-1)
print(fillmask_pipe(text, top_k=10))

📄 License

This project is licensed under the Apache-2.0 license.

📚 Citation

If you are using the resource for your work, please cite the our paper:

@article{fengshenbang,
  author    = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
  title     = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
  journal   = {CoRR},
  volume    = {abs/2209.02970},
  year      = {2022}
}

You can also cite our website:

@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご