🚀 DeBERTa V3 Japanese Model
This is a model based on DeBERTa V3 pre-trained on Japanese resources. It addresses the challenges of Japanese language processing and offers high - performance solutions for related tasks.
🚀 Quick Start
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_name = 'globis-university/deberta-v3-japanese-xsmall'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
✨ Features
- Based on the well - known DeBERTa V3 model.
- Specialized for the Japanese language.
- Does not use a morphological analyzer during inference.
- Respects word boundaries to some extent (does not produce tokens spanning multiple words like
の都合上
or の判定負けを喫し
).
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_name = 'globis-university/deberta-v3-japanese-xsmall'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
📚 Documentation
Tokenizer
The tokenizer is trained using the method introduced by Kudo.
Key points include:
- No morphological analyzer needed during inference.
- Tokens do not cross word boundaries (dictionary:
unidic-cwj-202302
).
- Easy to use with Hugging Face.
- Smaller vocabulary size.
Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the microsoft/deberta-v3-base model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this.
Note that, among the three models: xsmall, base, and large, the first two were trained using the unigram algorithm, while only the large model was trained using the BPE algorithm.
The reason for this is simple: while the large model was independently trained to increase its vocabulary size, for some reason, training with the unigram algorithm was not successful.
Thus, prioritizing the completion of the model over investigating the cause, we switched to the BPE algorithm.
Data
Dataset Name |
Notes |
File Size (with metadata) |
Factor |
Wikipedia |
2023/07; WikiExtractor |
3.5GB |
x2 |
Wikipedia |
2023/07; [cl - tohoku's method](https://github.com/cl - tohoku/bert - japanese/blob/main/make_corpus_wiki.py) |
4.8GB |
x2 |
WikiBooks |
2023/07; [cl - tohoku's method](https://github.com/cl - tohoku/bert - japanese/blob/main/make_corpus_wiki.py) |
43MB |
x2 |
Aozora Bunko |
2023/07; [globis - university/aozorabunko - clean](https://huggingface.co/globis - university/globis - university/aozorabunko - clean) |
496MB |
x4 |
CC - 100 |
ja |
90GB |
x1 |
mC4 |
ja; extracted 10%, with Wikipedia - like focus via DSIR |
91GB |
x1 |
OSCAR 2023 |
ja; extracted 10%, with Wikipedia - like focus via DSIR |
26GB |
x1 |
Training parameters
- Number of devices: 8
- Batch size: 48 x 8
- Learning rate: 3.84e - 4
- Maximum sequence length: 512
- Optimizer: AdamW
- Learning rate scheduler: Linear schedule with warmup
- Training steps: 1,000,000
- Warmup steps: 100,000
- Precision: Mixed (fp16)
- Vocabulary size: 32,000
Evaluation
Model |
#params |
JSTS |
JNLI |
JSQuAD |
JCQA |
≤ small |
|
|
|
|
|
[izumi - lab/deberta - v2 - small - japanese](https://huggingface.co/izumi - lab/deberta - v2 - small - japanese) |
17.8M |
0.890/0.846 |
0.880 |
- |
0.737 |
[globis - university/deberta - v3 - japanese - xsmall](https://huggingface.co/globis - university/deberta - v3 - japanese - xsmall) |
33.7M |
0.916/0.880 |
0.913 |
0.869/0.938 |
0.821 |
base |
|
|
|
|
|
[cl - tohoku/bert - base - japanese - v3](https://huggingface.co/cl - tohoku/bert - base - japanese - v3) |
111M |
0.919/0.881 |
0.907 |
0.880/0.946 |
0.848 |
[nlp - waseda/roberta - base - japanese](https://huggingface.co/nlp - waseda/roberta - base - japanese) |
111M |
0.913/0.873 |
0.895 |
0.864/0.927 |
0.840 |
[izumi - lab/deberta - v2 - base - japanese](https://huggingface.co/izumi - lab/deberta - v2 - base - japanese) |
110M |
0.919/0.882 |
0.912 |
- |
0.859 |
[ku - nlp/deberta - v2 - base - japanese](https://huggingface.co/ku - nlp/deberta - v2 - base - japanese) |
112M |
0.922/0.886 |
0.922 |
0.899/0.951 |
- |
[ku - nlp/deberta - v3 - base - japanese](https://huggingface.co/ku - nlp/deberta - v3 - base - japanese) |
160M |
0.927/0.891 |
0.927 |
0.896/- |
- |
[globis - university/deberta - v3 - japanese - base](https://huggingface.co/globis - university/deberta - v3 - japanese - base) |
110M |
0.925/0.895 |
0.921 |
0.890/0.950 |
0.886 |
large |
|
|
|
|
|
[cl - tohoku/bert - large - japanese - v2](https://huggingface.co/cl - tohoku/bert - large - japanese - v2) |
337M |
0.926/0.893 |
0.929 |
0.893/0.956 |
0.893 |
[nlp - waseda/roberta - large - japanese](https://huggingface.co/nlp - waseda/roberta - large - japanese) |
337M |
0.930/0.896 |
0.924 |
0.884/0.940 |
0.907 |
[nlp - waseda/roberta - large - japanese - seq512](https://huggingface.co/nlp - waseda/roberta - large - japanese - seq512) |
337M |
0.926/0.892 |
0.926 |
0.918/0.963 |
0.891 |
[ku - nlp/deberta - v2 - large - japanese](https://huggingface.co/ku - nlp/deberta - v2 - large - japanese) |
339M |
0.925/0.892 |
0.924 |
0.912/0.959 |
- |
[globis - university/deberta - v3 - japanese - large](https://huggingface.co/globis - university/deberta - v3 - japanese - large) |
352M |
0.928/0.896 |
0.924 |
0.896/0.956 |
0.900 |
📄 License
CC BY SA 4.0
Acknowledgement
We used ABCI for computing resources. Thank you.