đ Ancient Greek BERT
The first and only available Ancient Greek sub - word BERT model, achieving state - of - the - art results on Part - of - Speech Tagging and Morphological Analysis after fine - tuning.

This is the first and only available Ancient Greek sub - word BERT model. It offers state - of - the - art performance on Part - of - Speech Tagging and Morphological Analysis after fine - tuning. Pre - trained weights are provided for a standard 12 - layer, 768d BERT - base model.
Further scripts for using the model and fine - tuning it for PoS Tagging can be found on our Github repository.
Please refer to our paper titled: "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek". It was presented in Proceedings of The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH - CLfL 2021).
đ Quick Start
Requirements
pip install transformers
pip install unicodedata
pip install flair
Usage
The model can be directly used from the HuggingFace Model Hub as follows:
from transformers import AutoTokenizer, AutoModel
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModel
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
đĻ Installation
The installation requirements are as follows:
pip install transformers
pip install unicodedata
pip install flair
đ Documentation
Fine - tuning for POS/Morphological Analysis
Please refer to the GitHub repository for the code and details regarding fine - tuning.
Training data
The model was initialized from AUEB NLP Group's Greek BERT and then trained on monolingual data from the First1KGreek Project, Perseus Digital Library, PROIEL Treebank, and Gorman's Treebank.
Training and Eval details
Standard de - accentuating and lower - casing for Greek, as suggested in AUEB NLP Group's Greek BERT, were applied. The model was trained on 4 NVIDIA Tesla V100 16GB GPUs for 80 epochs, with a max - seq - len of 512. It achieved a perplexity of 4.8 on the held - out test set. When fine - tuned for PoS Tagging and Morphological Analysis on all 3 treebanks, it averaged over 90% accuracy, which is state - of - the - art. For further questions, please consult our paper or contact me.
đ License
Cite
If you use Ancient - Greek - BERT in your research, please cite the paper:
@inproceedings{ancient-greek-bert,
author = {Singh, Pranaydeep and Rutten, Gorik and Lefever, Els},
title = {A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek},
year = {2021},
booktitle = {The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH - CLfL 2021)}
}