bert-base-multilingual-cased-finetuned-dutch-squad2 Open Source Model - Free Implementation of Dutch Question and Answer Application

Home

Bert Base Multilingual Cased Finetuned Dutch Squad2

Developed by henryk

Dutch Q&A model fine-tuned from Google's multilingual BERT, trained on translated SQuAD2.0 dataset

Question Answering System Other#Dutch Q&A #Multilingual BERT #Machine Translation Adaptation

Downloads 55

Release Time : 3/2/2022

Model Overview

BERT model optimized for Dutch Q&A tasks, capable of handling reading comprehension questions with/without answers

Model Features

Multilingual Foundation

Built upon BERT multilingual model supporting 104 languages

Dutch Optimization

Specially fine-tuned for Dutch Q&A tasks

SQuAD2.0 Compatibility

Supports enhanced SQuAD tasks with unanswerable questions

Model Capabilities

Dutch Reading Comprehension

Question Answering System

Text Understanding

Use Cases

EdTech

Dutch Learning Assistant

Helps learners understand Dutch documents through Q&A format

Exact Match 67.38/F1 Score 71.36

Smart Customer Service

Dutch FAQ System

Automatically answers knowledge-based FAQs

🚀 Multilingual + Dutch SQuAD2.0

This model is a multilingual model provided by Google's research team, fine-tuned for the Dutch Q&A downstream task.

📚 Documentation

Details of the language model

Language model (bert-base-multilingual-cased):

12 layers
768 hidden units
12 heads
110M parameters

It was trained on cased text in the top 104 languages with the largest Wikipedias.

Details of the downstream task

The mtranslate Python module was used to machine-translate SQuAD2.0. To find the start tokens, direct translations of the answers were searched in the corresponding paragraphs. Due to different translations depending on the context (lack of context in the pure answer), the answer couldn't always be found in the text, resulting in a loss of question-answer examples. This is a potential problem that can lead to errors in the dataset.

Dataset	# Q&A
SQuAD2.0 Train	130 K
Dutch SQuAD2.0 Train	99 K
SQuAD2.0 Dev	12 K
Dutch SQuAD2.0 Dev	10 K

Model benchmark

Model	EM/F1	HasAns (EM/F1)	NoAns
robBERT	58.04/60.95	33.08/40.64	73.67
dutchBERT	64.25/68.45	45.59/56.49	75.94
multiBERT	67.38/71.36	47.42/57.76	79.88

🔧 Technical Details

Model training

The model was trained on a Tesla V100 GPU with the following command:

export SQUAD_DIR=path/to/nl_squad

python run_squad.py 
  --model_type bert \
  --model_name_or_path bert-base-multilingual-cased \
  --do_train \
  --do_eval \
  --train_file $SQUAD_DIR/nl_squadv2_train_clean.json \
  --predict_file $SQUAD_DIR/nl_squadv2_dev_clean.json \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --save_steps=8000 \
  --output_dir ../../output \
  --overwrite_cache \
  --overwrite_output_dir

Results:

{'exact': 67.38028751680629, 'f1': 71.362297054268, 'total': 9669, 'HasAns_exact': 47.422126745435015, 'HasAns_f1': 57.761023151910734, 'HasAns_total': 3724, 'NoAns_exact': 79.88225399495374, 'NoAns_f1': 79.88225399495374, 'NoAns_total': 5945, 'best_exact': 67.53542248422795, 'best_exact_thresh': 0.0, 'best_f1': 71.36229705426837, 'best_f1_thresh': 0.0}

💻 Usage Examples

Basic Usage

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2",
    tokenizer="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2"
)

qa_pipeline({
    'context': "Amsterdam is de hoofdstad en de dichtstbevolkte stad van Nederland.",
    'question': "Wat is de hoofdstad van Nederland?"})

Output

{
  "score": 0.83,
  "start": 0, 
  "end": 9,
  "answer": "Amsterdam"
}

📞 Contact

If you want to discuss or get access to the Dutch version of SQuAD, please feel free to contact me via LinkedIn.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご