GRC-NER-XLMR Open-Source Model - Free to Implement Entity Recognition of Ancient Greek Figures, Locations, etc.

Grc Ner Xlmr

Developed by UGARIT

Pre-trained Ancient Greek NER annotation model supporting recognition of entities such as persons, locations, ethnic/religious groups

Sequence Labeling

Transformers

OtherOpen Source License:MIT #Ancient Greek NER #Historical Document Analysis #Multi-category Entity Recognition

Downloads 22

Release Time : 3/31/2024

Model Overview

This model is a Transformer-based Ancient Greek named entity recognition and classification model, specifically designed for entity annotation tasks in Ancient Greek texts.

Model Features

Multi-category Entity Recognition

Capable of recognizing various entity types in Ancient Greek texts including persons, locations, ethnic/religious groups

High-precision Annotation

Achieves over 94% F1 score in person recognition, with overall F1 score exceeding 89%

Diverse Training Data

Trained using annotated data from multiple Ancient Greek classics including 'Symposium', 'Hellenica', 'Odyssey'

Model Capabilities

Ancient Greek text analysis

Named entity recognition

Entity classification

Use Cases

Classical Literature Research

Classical Text Entity Annotation

Automatically annotate entities like persons and locations in Ancient Greek texts

Helps researchers quickly analyze entity distribution and relationships in texts

Digital Humanities Projects

Provides automatic annotation support for projects like Digital Athenaeus, Digital Periegesis

Improves efficiency in digitizing classical texts

Linguistics Education

Ancient Greek Teaching Assistance

Helps students identify key entities in texts

Enhances language learning efficiency

🚀 Named Entity Recognition for Ancient Greek

A pre-trained NER tagging model tailored for Ancient Greek, addressing the need for accurate named - entity recognition in ancient texts.

📦 Installation

No specific installation steps were provided in the original document, so this section is skipped.

✨ Features

Pretrained for NER tagging in Ancient Greek.
Trained on multiple available annotated corpora in Ancient Greek.

📚 Documentation

Data

We trained the models on available annotated corpora in Ancient Greek. There are only two sizeable annotated datasets in Ancient Greek, which are currently under release:

The first one by Berti 2023, consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project.
The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the Digital Periegesis project.

In addition, we used smaller corpora annotated by students and scholars on Recogito:

The Odyssey annotated by Kemp 2021.
A mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography, annotated by Chiara Palladino.
Book 1 of Xenophon’s Anabasis, created by Thomas Visser.
Demosthenes’ Against Neaira, created by Rachel Milio.

Training Dataset

Property	Person	Location	NORP	MISC
Odyssey	2,469	698	0	0
Deipnosophists	14,921	2,699	5,110	3,060
Pausanias	10,205	8,670	4,972	0
Other Datasets	3,283	2,040	1,089	0
Total	30,878	14,107	11,171	3,060

Validation Dataset

Property	Person	Location	NORP	MISC
Xenophon	1,190	796	857	0

Results

Class	Metric	Test	Validation
LOC	precision	83.33%	88.66%
	recall	81.27%	88.94%
	f1	82.29%	88.80%
MISC	precision	83.25%	0
	recall	81.21%	0
	f1	82.22%	0
NORP	precision	88.71%	94.76%
	recall	90.76%	94.50%
	f1	89.73%	94.63%
PER	precision	91.72%	94.22%
	recall	94.42%	96.06%
	f1	93.05%	95.13%
Overall	precision	88.83%	92.91%
	recall	89.99%	93.72%
	f1	89.41%	93.32%
	Accuracy	97.50%	98.87%

💻 Usage Examples

Basic Usage

This colab notebook contains the necessary code to use the model.

from transformers import pipeline

# create pipeline for NER
ner = pipeline('ner', model="UGARIT/grc-ner-xlmr", aggregation_strategy = 'first')
ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .")

Output

[{'entity_group': 'PER',
  'score': 0.9999428,
  'word': '',
  'start': 13,
  'end': 14},
 {'entity_group': 'PER',
  'score': 0.99994195,
  'word': 'Ἀλέξανδρος',
  'start': 14,
  'end': 24},
 {'entity_group': 'NORP',
  'score': 0.9087087,
  'word': 'Πέρσῃ',
  'start': 32,
  'end': 38},
 {'entity_group': 'NORP',
  'score': 0.97572577,
  'word': 'Μακεδόνα',
  'start': 50,
  'end': 59},
 {'entity_group': 'NORP',
  'score': 0.9993412,
  'word': 'Πέρσαι',
  'start': 104,
  'end': 111}]

📄 License

The project is licensed under the MIT license.

📚 Citation

@inproceedings{palladino-yousef-2024-development,
    title = "Development of Robust {NER} Models and Named Entity Tagsets for {A}ncient {G}reek",
    author = "Palladino, Chiara  and
      Yousef, Tariq",
    editor = "Sprugnoli, Rachele  and
      Passarotti, Marco",
    booktitle = "Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lt4hala-1.11",
    pages = "89--97",
    abstract = "This contribution presents a novel approach to the development and evaluation of transformer-based models for Named Entity Recognition and Classification in Ancient Greek texts. We trained two models with annotated datasets by consolidating potentially ambiguous entity types under a harmonized set of classes. Then, we tested their performance with out-of-domain texts, reproducing a real-world use case. Both models performed very well under these conditions, with the multilingual model being slightly superior on the monolingual one. In the conclusion, we emphasize current limitations due to the scarcity of high-quality annotated corpora and to the lack of cohesive annotation strategies for ancient languages.",
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご