BERTje Open-source Dutch Pre-trained Model - Free Deployment, Optimized for Dutch Applications

Bert Base Dutch Cased

Developed by GroNLP

BERTje is a Dutch pre-trained BERT model developed by the University of Groningen, specifically optimized for Dutch language.

Large Language Model Other#Dutch BERT #Named Entity Recognition #Part-of-Speech Tagging

Downloads 51.97k

Release Time : 3/2/2022

Model Overview

BERTje is a BERT model optimized for Dutch, used for various natural language processing tasks such as named entity recognition and part-of-speech tagging.

Model Features

Dutch Language Optimization

Specifically pre-trained for Dutch, outperforming multilingual models on Dutch language tasks.

Case-Sensitive

Uses case-sensitive tokenization to better handle proper nouns and special vocabulary in Dutch.

Multi-Task Support

Supports various natural language processing tasks, including named entity recognition and part-of-speech tagging.

Model Capabilities

Text Understanding

Named Entity Recognition

Part-of-Speech Tagging

Use Cases

Natural Language Processing

Dutch Named Entity Recognition

Performs named entity recognition on CoNLL-2002 and SoNaR-1 datasets

Achieved 90.24 F1 score on CoNLL-2002 and 84.93 F1 score on SoNaR-1

Dutch Part-of-Speech Tagging

Performs part-of-speech tagging on UDv2.5 LassySmall dataset

Achieved 96.48 accuracy

🚀 BERTje: A Dutch BERT model

BERTje is a pre - trained BERT model for the Dutch language. It was developed at the University of Groningen, aiming to provide high - quality natural language processing capabilities for Dutch text.

🚀 Quick Start

Model description

BERTje is a Dutch pre - trained BERT model developed at the University of Groningen.

For more details, you can check out our paper on arXiv, the code on Github, and related work on Semantic Scholar.

The paper and Github page mention fine - tuned models that are available here.

How to use

from transformers import AutoTokenizer, AutoModel, TFAutoModel

tokenizer = AutoTokenizer.from_pretrained("GroNLP/bert-base-dutch-cased")
model = AutoModel.from_pretrained("GroNLP/bert-base-dutch-cased")  # PyTorch
model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased")  # Tensorflow

⚠️ Important Note

The vocabulary size of BERTje has changed in 2021. If you use an older fine - tuned model and experience problems with the GroNLP/bert-base-dutch-cased tokenizer, use the following tokenizer:

tokenizer = AutoTokenizer.from_pretrained("GroNLP/bert-base-dutch-cased", revision="v1")  # v1 is the old vocabulary

Benchmarks

The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT - NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine - tuning procedures for these benchmarks are identical for each pre - trained model. You may be able to achieve higher scores for individual models by optimizing fine - tuning procedures.

More experimental results will be added to this page when they are finished. Technical details about how to fine - tune these models will be published later, as well as downloadable fine - tuned checkpoints.

All of the tested models are base sized (12) layers with cased tokenization.

Headers in the tables below link to original data sources. Scores link to the model pages that corresponds to that specific fine - tuned model. These tables will be updated when more simple fine - tuned models are made available.

Named Entity Recognition

Model	CoNLL - 2002	SoNaR - 1	spaCy UD LassySmall
BERTje	90.24	84.93	86.10
mBERT	88.61	84.19	86.77
BERT - NL	85.05	80.45	81.62
RobBERT	84.72	81.98	79.84

Part - of - speech tagging

Model	UDv2.5 LassySmall
BERTje	96.48
mBERT	96.20
BERT - NL	96.10
RobBERT	95.91

BibTeX entry and citation info

@misc{devries2019bertje,
\ttitle = {{BERTje}: {A} {Dutch} {BERT} {Model}},
\tshorttitle = {{BERTje}},
\tauthor = {de Vries, Wietse  and  van Cranenburgh, Andreas  and  Bisazza, Arianna  and  Caselli, Tommaso  and  Noord, Gertjan van  and  Nissim, Malvina},
\tyear = {2019},
\tmonth = dec,
\thowpublished = {arXiv:1912.09582},
\turl = {http://arxiv.org/abs/1912.09582},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご