🚀 BETO (Spanish BERT) + Spanish SQuAD2.0 + distillation using 'bert-base-multilingual-cased' as teacher
This project presents a fine - tuned and distilled version of BETO for Q&A tasks. It leverages SQuAD-es-v2.0 dataset. The distillation process makes the model smaller, faster, cheaper, and lighter compared to bert-base-spanish-wwm-cased-finetuned-spa-squad2-es.
🚀 Quick Start
This BETO - based model is ready for Q&A tasks. You can quickly start using it with the provided pipelines.
✨ Features
- Distilled Model: The model is distilled, making it more efficient than its non - distilled counterparts.
- Fine - Tuned on Spanish Q&A Dataset: It is fine - tuned on SQuAD-es-v2.0, suitable for Spanish Q&A.
- Fast Inference: On average, it is twice as fast as mBERT - base due to the distillation process.
📦 Installation
The model can be installed and used within the transformers
library. You can train it using the following command on a Tesla P100 GPU with 25GB of RAM:
!export SQUAD_DIR=/path/to/squad-v2_spanish \
&& python transformers/examples/distillation/run_squad_w_distillation.py \
--model_type bert \
--model_name_or_path dccuchile/bert-base-spanish-wwm-cased \
--teacher_type bert \
--teacher_name_or_path bert-base-multilingual-cased \
--do_train \
--do_eval \
--do_lower_case \
--train_file $SQUAD_DIR/train-v2.json \
--predict_file $SQUAD_DIR/dev-v2.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 5.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /content/model_output \
--save_steps 5000 \
--threads 4 \
--version_2_with_negative
💻 Usage Examples
Basic Usage
from transformers import *
nlp = pipeline(
'question-answering',
model='mrm8488/distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es',
tokenizer=(
'mrm8488/distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es',
{"use_fast": False}
)
)
nlp(
{
'question': '¿Para qué lenguaje está trabajando?',
'context': 'Manuel Romero está colaborando activamente con huggingface/transformers ' +
'para traer el poder de las últimas técnicas de procesamiento de lenguaje natural al idioma español'
}
)
You can play with this model and pipelines
in a Colab:

📚 Documentation
Details of the downstream task (Q&A) - Dataset
SQuAD-es-v2.0
Dataset |
# Q&A |
SQuAD2.0 Train |
130 K |
SQuAD2.0-es-v2.0 |
111 K |
SQuAD2.0 Dev |
12 K |
SQuAD-es-v2.0-small Dev |
69 K |
Model training
The model was trained on a Tesla P100 GPU and 25GB of RAM with the command shown in the Installation section.
More about Huggingface pipelines
Check this Colab out:

📄 License
This project is licensed under the Apache - 2.0 license.
Created by Manuel Romero/@mrm8488
Made with ♥ in Spain