Ancient Greek BERT Open - source Model - Free for Part - of - Speech Tagging and Morphological Analysis of Ancient Greek

Ancient Greek BERT

Developed by pranaydeeps

The first and only available subword BERT model for Ancient Greek, achieving state-of-the-art fine-tuned performance in part-of-speech tagging and morphological analysis tasks.

Large Language Model

Transformers

#Ancient Greek processing #Morphological analysis SOTA #Historical text mining

Downloads 214

Release Time : 3/2/2022

Model Overview

A BERT-based pre-trained language model for Ancient Greek, specifically designed for part-of-speech tagging and morphological analysis tasks in Ancient Greek texts.

Model Features

First Ancient Greek BERT

This is the first BERT model specifically developed for Ancient Greek, filling a gap in this field.

State-of-the-art morphological analysis performance

Achieves over 90% accuracy in part-of-speech tagging and morphological analysis tasks, outperforming existing methods.

Multi-source training data

Integrates multiple authoritative Ancient Greek corpora, including First1KGreek, Perseus Digital Library, etc.

Pre-trained weights available

Provides pre-trained model weights for easy fine-tuning of downstream tasks.

Model Capabilities

Ancient Greek text understanding

Part-of-speech tagging

Morphological analysis

Language modeling

Use Cases

Classical literature research

Ancient text digitization processing

Automatically analyzes the part-of-speech and morphological features of Ancient Greek texts

Accuracy exceeds 90%

Linguistic analysis

Ancient Greek morphology research

Systematically analyzes the inflection patterns of Ancient Greek

🚀 Ancient Greek BERT

The first and only available Ancient Greek sub - word BERT model, achieving state - of - the - art results on Part - of - Speech Tagging and Morphological Analysis after fine - tuning.

Ancient Greek Image

This is the first and only available Ancient Greek sub - word BERT model. It offers state - of - the - art performance on Part - of - Speech Tagging and Morphological Analysis after fine - tuning. Pre - trained weights are provided for a standard 12 - layer, 768d BERT - base model.

Further scripts for using the model and fine - tuning it for PoS Tagging can be found on our Github repository.

Please refer to our paper titled: "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek". It was presented in Proceedings of The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH - CLfL 2021).

🚀 Quick Start

Requirements

pip install transformers
pip install unicodedata
pip install flair

Usage

The model can be directly used from the HuggingFace Model Hub as follows:

from transformers import AutoTokenizer, AutoModel
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModel
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")

📦 Installation

The installation requirements are as follows:

pip install transformers
pip install unicodedata
pip install flair

📚 Documentation

Fine - tuning for POS/Morphological Analysis

Please refer to the GitHub repository for the code and details regarding fine - tuning.

Training data

The model was initialized from AUEB NLP Group's Greek BERT and then trained on monolingual data from the First1KGreek Project, Perseus Digital Library, PROIEL Treebank, and Gorman's Treebank.

Training and Eval details

Standard de - accentuating and lower - casing for Greek, as suggested in AUEB NLP Group's Greek BERT, were applied. The model was trained on 4 NVIDIA Tesla V100 16GB GPUs for 80 epochs, with a max - seq - len of 512. It achieved a perplexity of 4.8 on the held - out test set. When fine - tuned for PoS Tagging and Morphological Analysis on all 3 treebanks, it averaged over 90% accuracy, which is state - of - the - art. For further questions, please consult our paper or contact me.

📄 License

Cite

If you use Ancient - Greek - BERT in your research, please cite the paper:

@inproceedings{ancient-greek-bert,
author = {Singh, Pranaydeep and Rutten, Gorik and Lefever, Els},
title = {A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek},
year = {2021},
booktitle = {The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH - CLfL 2021)}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご