🚀 german-financial-statements-bert
This model is a fine - tuned version of [bert - base - german - cased](https://huggingface.co/bert - base - german - cased) using German financial statements, aiming to serve the financial statements domain.
🚀 Quick Start
This model is a fine - tuned version of [bert - base - german - cased](https://huggingface.co/bert - base - german - cased) using German financial statements.
It achieves the following results on the evaluation set:
- Loss: 1.2025
- Accuracy: 0.7376
- Perplexity: 3.3285
✨ Features
Annual financial statements in Germany are published in the Federal Gazette and are freely accessible. The documents describe the entrepreneurial and in particular the financial situation of a company with reference to a reporting period. The german - financial - statements - bert model aims to provide a BERT model specifically for this domain.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Training and evaluation data
The training was performed with 100,000 natural language sentences from annual financial statements. 50,000 of these sentences were taken unfiltered and randomly from 5,500 different financial statement documents, and another 50,000 were also taken randomly from 5,500 different financial statement documents, but this half was filtered so that only sentences referring to a financial entity were selected. Specifically, this means that the second half of the sentences contains an indicator for a reference to a financial entity (EUR, Euro, TEUR, €, T€). The evaluation was carried out with 20,000 sentences of the same origin and distribution.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e - 05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 3.0
Framework versions
- Transformers 4.17.0
- Pytorch 1.10.0+cu111
- Datasets 1.18.3
- Tokenizers 0.11.6
🔧 Technical Details
The model is a fine - tuned version of [bert - base - german - cased](https://huggingface.co/bert - base - german - cased) on German financial statements data. It uses specific hyperparameters and a filtered dataset for training and evaluation, which is described in detail above.
📄 License
This project is licensed under the MIT license.