đ TinyBERT Model for German Language
This project presents a TinyBERT model designed for the German language (de). The model is created by distilling the bert base cased model (https://huggingface.co/dbmdz/bert-base-german-cased) following the approach described in https://arxiv.org/abs/1909.10351 (TinyBERT: Distilling BERT for Natural Language Understanding).
đ Quick Start
Prerequisites
The following versions of libraries are used:
Property |
Details |
torch |
1.4.0 |
transformers |
4.8.1 |
Loading the Model for LM (Fill - Mask) Task
tokenizer = transformers.BertTokenizer.from_pretrained(model_dir + '/vocab.txt', do_lower_case=False)
config = transformers.BertConfig.from_json_file(model_dir+'config.json')
model = transformers.BertModel(config=config)
model.pooler = nn.Sequential(nn.Linear(in_features=model.config.hidden_size, out_features=model.config.hidden_size, bias=True),
nn.LayerNorm((model.config.hidden_size,), eps=1e-12, elementwise_affine=True),
nn.Linear(in_features=model.config.hidden_size, out_features=len(tokenizer), bias=True))
model.resize_token_embeddings(len(tokenizer))
checkpoint = torch.load(model_dir+'/pytorch_model.bin', map_location=torch.device('cuda'))
model.load_state_dict(checkpoint)
Loading the Model for NER or Classification Task
In case of NER or Classification task, we first load the model for the LM task and then change the pooler:
model.pooler = nn.Sequential(nn.Dropout(p=config.hidden_dropout_prob, inplace=False),
nn.Linear(in_features=config.hidden_size, out_features=n_classes, bias=True))
⨠Features
- Language Specific: Specifically designed for the German language.
- Distilled Model: Created by distilling the bert base cased model, which can potentially reduce computational resources while maintaining performance.
đĻ Installation
There is no specific installation steps provided in the original document. If you want to use this model, you need to ensure that the required libraries (torch==1.4.0
and transformers==4.8.1
) are installed. You can use the following commands to install them:
pip install torch==1.4.0
pip install transformers==4.8.1
đģ Usage Examples
Basic Usage
The basic usage is to load the model for the LM (fill - mask) task as shown in the code above.
Advanced Usage
For NER or Classification tasks, you can follow the steps of loading the model for the LM task first and then modify the pooler as described in the code above.
đ Documentation
Dataset
The model is trained on the German Wikipedia Text Corpus. You can access the dataset here: https://github.com/t-systems-on-site-services-gmbh/german-wikipedia-text-corpus
Model Distillation
The model is distilled from the bert base cased model (https://huggingface.co/dbmdz/bert-base-german-cased) using the method described in https://arxiv.org/abs/1909.10351.
đ§ Technical Details
The model is a distilled version of the bert base cased model. Distillation is a technique used to transfer knowledge from a large model (teacher) to a smaller model (student). In this case, the TinyBERT model (student) learns from the bert base cased model (teacher) to achieve similar performance with less computational cost. The distillation process is based on the approach described in the paper "TinyBERT: Distilling BERT for Natural Language Understanding".
đ License
There is no license information provided in the original document, so this section is skipped.