TinyBERT_General_4L_312D_de Open-source Model - Optimized for German, Empowering Natural Language Processing Tasks

Tinybert General 4L 312D De

Developed by dvm1983

This is a TinyBERT model optimized for German, created by distilling the BERT base cased model, suitable for natural language processing tasks.

Large Language Model

Transformers

German#German fill-mask #Knowledge distillation #Wikipedia corpus

Downloads 269

Release Time : 3/2/2022

Model Overview

This model is a lightweight version created through distillation from the German BERT base cased model, retaining the main language understanding capabilities of the original model while reducing model size and computational requirements.

Model Features

Lightweight Design

Reduces model size through knowledge distillation while maintaining good performance

German Optimization

Specially trained and optimized for German language characteristics

Multi-task Support

Applicable to various natural language processing tasks such as fill-mask, named entity recognition, and classification

Model Capabilities

Text understanding

Fill-mask prediction

Named entity recognition

Text classification

Use Cases

Natural Language Processing

Text Completion

Predicts masked words in text

Entity Recognition

Identifies entities such as person names, locations, and organizations in text

Sentiment Analysis

Classifies sentiment tendencies in German text

🚀 TinyBERT Model for German Language

This project presents a TinyBERT model designed for the German language (de). The model is created by distilling the bert base cased model (https://huggingface.co/dbmdz/bert-base-german-cased) following the approach described in https://arxiv.org/abs/1909.10351 (TinyBERT: Distilling BERT for Natural Language Understanding).

🚀 Quick Start

Prerequisites

The following versions of libraries are used:

Property	Details
torch	1.4.0
transformers	4.8.1

Loading the Model for LM (Fill - Mask) Task

# How to load model for LM(fill-mask) task: 
tokenizer = transformers.BertTokenizer.from_pretrained(model_dir + '/vocab.txt', do_lower_case=False)
config = transformers.BertConfig.from_json_file(model_dir+'config.json')
model = transformers.BertModel(config=config)
model.pooler = nn.Sequential(nn.Linear(in_features=model.config.hidden_size, out_features=model.config.hidden_size, bias=True),
                             nn.LayerNorm((model.config.hidden_size,), eps=1e-12, elementwise_affine=True),
                             nn.Linear(in_features=model.config.hidden_size, out_features=len(tokenizer), bias=True))

model.resize_token_embeddings(len(tokenizer))
    
checkpoint = torch.load(model_dir+'/pytorch_model.bin', map_location=torch.device('cuda'))
model.load_state_dict(checkpoint)

Loading the Model for NER or Classification Task

In case of NER or Classification task, we first load the model for the LM task and then change the pooler:

model.pooler = nn.Sequential(nn.Dropout(p=config.hidden_dropout_prob, inplace=False), 
                             nn.Linear(in_features=config.hidden_size, out_features=n_classes, bias=True))

✨ Features

Language Specific: Specifically designed for the German language.
Distilled Model: Created by distilling the bert base cased model, which can potentially reduce computational resources while maintaining performance.

📦 Installation

There is no specific installation steps provided in the original document. If you want to use this model, you need to ensure that the required libraries (torch==1.4.0 and transformers==4.8.1) are installed. You can use the following commands to install them:

pip install torch==1.4.0
pip install transformers==4.8.1

💻 Usage Examples

Basic Usage

The basic usage is to load the model for the LM (fill - mask) task as shown in the code above.

Advanced Usage

For NER or Classification tasks, you can follow the steps of loading the model for the LM task first and then modify the pooler as described in the code above.

📚 Documentation

Dataset

The model is trained on the German Wikipedia Text Corpus. You can access the dataset here: https://github.com/t-systems-on-site-services-gmbh/german-wikipedia-text-corpus

Model Distillation

The model is distilled from the bert base cased model (https://huggingface.co/dbmdz/bert-base-german-cased) using the method described in https://arxiv.org/abs/1909.10351.

🔧 Technical Details

The model is a distilled version of the bert base cased model. Distillation is a technique used to transfer knowledge from a large model (teacher) to a smaller model (student). In this case, the TinyBERT model (student) learns from the bert base cased model (teacher) to achieve similar performance with less computational cost. The distillation process is based on the approach described in the paper "TinyBERT: Distilling BERT for Natural Language Understanding".

📄 License

There is no license information provided in the original document, so this section is skipped.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご