🚀 bert-base-multilingual-cased-finetuned-hausa
This is a Hausa BERT model fine - tuned from the bert - base - multilingual - cased model, offering better performance in text classification and named entity recognition compared to the multilingual BERT.
🚀 Quick Start
You can use this model with Transformers pipeline for masked token prediction.
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-hausa')
>>> unmasker("Shugaban [MASK] Muhammadu Buhari ya amince da shawarar da ma’aikatar sufuri karkashin jagoranci")
[{'sequence':
'[CLS] Shugaban Nigeria Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]',
'score': 0.9762618541717529,
'token': 22045,
'token_str': 'Nigeria'},
{'sequence': '[CLS] Shugaban Ka Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.007239189930260181,
'token': 25444,
'token_str': 'Ka'},
{'sequence': '[CLS] Shugaban, Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.001990817254409194,
'token': 117,
'token_str': ','},
{'sequence': '[CLS] Shugaban Ghana Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.001566368737258017,
'token': 28682,
'token_str': 'Ghana'},
{'sequence': '[CLS] Shugabanmu Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.0009375187801197171,
'token': 11717,
'token_str': '##mu'}]
✨ Features
bert-base-multilingual-cased-finetuned-hausa is a Hausa BERT model obtained by fine - tuning the bert - base - multilingual - cased model on Hausa language texts. It provides better performance than the multilingual BERT on text classification and named entity recognition datasets. Specifically, this model is a bert - base - multilingual - cased model that was fine - tuned on Hausa corpus.
📚 Documentation
Intended uses & limitations
How to use
You can use this model with Transformers pipeline for masked token prediction. The example code is shown in the "Quick Start" section.
Limitations and bias
This model is limited by its training dataset of entity - annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-hausa')
>>> unmasker("Shugaban [MASK] Muhammadu Buhari ya amince da shawarar da ma’aikatar sufuri karkashin jagoranci")
🔧 Technical Details
Training data
This model was fine - tuned on [Hausa CC - 100](http://data.statmt.org/cc - 100/).
Training procedure
This model was trained on a single NVIDIA V100 GPU.
Eval results on Test set (F - score, average over 5 runs)
Property |
Details |
[MasakhaNER](https://github.com/masakhane - io/masakhane - ner) mBERT F1 |
86.65 |
[MasakhaNER](https://github.com/masakhane - io/masakhane - ner) ha_bert F1 |
91.31 |
VOA Hausa Textclass mBERT F1 |
84.76 |
VOA Hausa Textclass ha_bert F1 |
90.98 |
BibTeX entry and citation info
By David Adelani