🚀 Portuguese BERT base cased QA (Question Answering), finetuned on SQUAD v1.1
This is a Portuguese BERT base cased model for question answering, fine - tuned on SQUAD v1.1, which can effectively handle Portuguese question - answering tasks.

🚀 Quick Start
This model is a Portuguese BERT base cased model for question answering, fine - tuned on the SQUAD v1.1 Portuguese dataset. It can be used to answer questions in Portuguese.
✨ Features
- Trained on the Portuguese SQUAD v1.1 dataset, suitable for Portuguese question - answering tasks.
- Based on the BERTimbau Base model, it can achieve state - of - the - art performance on multiple downstream NLP tasks.
📦 Installation
You can clone the model repository using the following commands:
git lfs install
git clone https://huggingface.co/pierreguillou/bert-base-cased-squad-v1.1-portuguese
GIT_LFS_SKIP_SMUDGE=1
💻 Usage Examples
Basic Usage
import transformers
from transformers import pipeline
context = r"""
A pandemia de COVID-19, também conhecida como pandemia de coronavírus, é uma pandemia em curso de COVID-19,
uma doença respiratória aguda causada pelo coronavírus da síndrome respiratória aguda grave 2 (SARS-CoV-2).
A doença foi identificada pela primeira vez em Wuhan, na província de Hubei, República Popular da China,
em 1 de dezembro de 2019, mas o primeiro caso foi reportado em 31 de dezembro do mesmo ano.
Acredita-se que o vírus tenha uma origem zoonótica, porque os primeiros casos confirmados
tinham principalmente ligações ao Mercado Atacadista de Frutos do Mar de Huanan, que também vendia animais vivos.
Em 11 de março de 2020, a Organização Mundial da Saúde declarou o surto uma pandemia. Até 8 de fevereiro de 2021,
pelo menos 105 743 102 casos da doença foram confirmados em pelo menos 191 países e territórios,
com cerca de 2 308 943 mortes e 58 851 440 pessoas curadas.
"""
model_name = 'pierreguillou/bert-base-cased-squad-v1.1-portuguese'
nlp = pipeline("question-answering", model=model_name)
question = "Quando começou a pandemia de Covid-19 no mundo?"
result = nlp(question=question, context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
Advanced Usage
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("pierreguillou/bert-base-cased-squad-v1.1-portuguese")
model = AutoModelForQuestionAnswering.from_pretrained("pierreguillou/bert-base-cased-squad-v1.1-portuguese")
📚 Documentation
Introduction
The model was trained on the dataset SQUAD v1.1 in Portuguese from the Deep Learning Brasil group on Google Colab.
The language model used is the BERTimbau Base (aka "bert - base - portuguese - cased") from Neuralmind.ai: BERTimbau Base is a pretrained BERT model for Brazilian Portuguese that achieves state - of - the - art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large.
Informations on the method used
All the informations are in the blog post : NLP | Modelo de Question Answering em qualquer idioma baseado no BERT base (estudo de caso em português)
Notebooks in Google Colab & GitHub
Performance
The results obtained are the following:
f1 = 82.50
exact match = 70.49
🔧 Technical Details
The training data for this model comes from the Portuguese SQUAD dataset. The base model is BERTimbau Base, which is fine - tuned on the Portuguese SQUAD v1.1 dataset to adapt it to Portuguese question - answering tasks.
📄 License
This project is under the MIT license.
⚠️ Important Note
The training data used for this model come from Portuguese SQUAD. It could contain a lot of unfiltered content, which is far from neutral, and biases.
📖 Citation
If you use our work, please cite:
@inproceedings{pierreguillou2021bertbasecasedsquadv11portuguese,
title={Portuguese BERT base cased QA (Question Answering), finetuned on SQUAD v1.1},
author={Pierre Guillou},
year={2021}
}
📋 Model Information
Property |
Details |
Model Type |
Portuguese BERT base cased QA, finetuned on SQUAD v1.1 |
Training Data |
Portuguese SQUAD v1.1 dataset |
Tags |
question - answering, bert, bert - base, pytorch |
Datasets |
brWaC, squad, squad_v1_pt |
Metrics |
squad |