🚀 mT5_base_yoruba_adr
mT5_base_yoruba_adr is an automatic diacritics restoration model for the Yorùbá language. It's based on a fine - tuned mT5 - base model and can achieve state - of - the - art performance in adding correct diacritics or tonal marks to Yorùbá texts.
🚀 Quick Start
You can use this model with Transformers pipeline for ADR.
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForTokenClassification.from_pretrained("")
nlp = pipeline("", model=model, tokenizer=tokenizer)
example = "Emir of Kano turban Zhang wey don spend 18 years for Nigeria"
ner_results = nlp(example)
print(ner_results)
✨ Features
- State - of - the - art Performance: Achieves excellent results in adding correct diacritics or tonal marks to Yorùbá texts.
- Fine - tuned Model: Based on the mT5 - base model, fine - tuned on specific Yorùbá corpora.
📦 Installation
No specific installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForTokenClassification.from_pretrained("")
nlp = pipeline("", model=model, tokenizer=tokenizer)
example = "Emir of Kano turban Zhang wey don spend 18 years for Nigeria"
ner_results = nlp(example)
print(ner_results)
📚 Documentation
Intended uses & limitations
How to use
The model can be used with the Transformers pipeline for automatic diacritics restoration (ADR).
Limitations and bias
This model is limited by its training dataset of entity - annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
Training data
This model was fine - tuned on the JW300 Yorùbá corpus and the Menyo - 20k dataset.
Training procedure
This model was trained on a single NVIDIA V100 GPU.
Eval results on Test set (BLEU score)
BibTeX entry and citation info
By Jesujoba Alabi and David Adelani
🔧 Technical Details
The model is a fine - tuned mT5_base model. It was trained on a single NVIDIA V100 GPU and fine - tuned on specific Yorùbá corpora, which contributes to its performance in automatic diacritics restoration for the Yorùbá language.
📄 License
No license information is provided in the original document, so this section is skipped.