span-marker-gelectra-large-germeval14 Open-source Model - Accurately Identify German Named Entities

Span Marker Gelectra Large Germeval14

Developed by stefan-it

A German named entity recognition model fine-tuned on the GermEval 2014 named entity recognition dataset based on the SpanMarker architecture.

Sequence Labeling Open Source License:MIT #German Named Entity Recognition #Fine-grained Entity Annotation #GELECTRA Backbone Network

Downloads 826

Release Time : 8/15/2023

Model Overview

This model uses GELECTRA Large as the backbone network and is specifically designed for named entity recognition tasks in German texts, supporting the recognition of 12 entity categories.

Model Features

High-precision Recognition

Achieved an F1 score of 89.08% on the GermEval 2014 test set.

Fine-grained Entity Classification

Supports the recognition of 12 entity categories, including special markings for derivatives and compound words.

Optimized for Professional Domains

Specifically optimized for German Wikipedia and news corpora.

Model Capabilities

German Named Entity Recognition

Fine-grained Entity Classification

Nested Entity Recognition

Use Cases

Information Extraction

News Content Analysis

Extract key information such as persons, locations, and organizations from German news.

Accurately recognize more than 89% of named entities.

Knowledge Graph Construction

Automatically annotate entities for German knowledge graphs.

🚀 SpanMarker for GermEval 2014 NER

This is a SpanMarker model fine-tuned on the GermEval 2014 NER Dataset. It aims to solve named entity recognition tasks in German texts, providing accurate identification of named entities in the dataset.

✨ Features

Fine-tuned on GermEval 2014: The model is specifically fine-tuned on the GermEval 2014 NER Dataset, which is based on German Wikipedia and News Corpora, covering over 31,000 sentences and 590,000 tokens.
12 Named Entity Classes: It can recognize 12 classes of named entities, including four main classes (PER, LOC, ORG, OTH) and their sub - classes with fine - grained labels.
Accurate Evaluation: Evaluation is performed using both SpanMarker's internal evaluation code with seqeval and the official GermEval 2014 Evaluation Script.

📦 Installation

The installation steps are not provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from span_marker import SpanMarkerModel

# Download from the Hub
model = SpanMarkerModel.from_pretrained("stefan-it/span-marker-gelectra-large-germeval14")

# Run inference
entities = model.predict("Jürgen Schmidhuber studierte ab 1983 Informatik und Mathematik an der TU München .")

📚 Documentation

Dataset Introduction

The GermEval 2014 NER Shared Task uses a new dataset with German Named Entity annotation. The data was sampled from German Wikipedia and News Corpora as a collection of citations. The NER annotation follows the NoSta - D guidelines, which extend the Tübingen Treebank guidelines. It uses four main NER categories with sub - structure and annotates embeddings among NEs, such as [ORG FC Kickers [LOC Darmstadt]].

Named Entity Classes

12 classes of Named Entities are annotated and recognized. There are four main classes: PERson, LOCation, ORGanisation, and OTHer. Sub - classes are introduced with two fine - grained labels: -deriv marks derivations from NEs (e.g., "englisch"), and -part marks compounds including a NE as a subsequence (e.g., deutschlandweit).

Fine - Tuning

We use the same hyper - parameters as in the "German's Next Language Model" paper, with the GELECTRA Large model as the backbone.

Evaluation is carried out using SpanMarker's internal evaluation code with seqeval and the official GermEval 2014 Evaluation Script. A backup of the nereval.py script can be found here.

We fine - tune 5 models and upload the model with the best F1 - Score on the development set. The results on the development set are as follows:

Model	Run 1	Run 2	Run 3	Run 4	Run 5	Avg.
GELECTRA Large (5e - 05)	(89.99) / 89.08	(89.55) / 89.23	(89.60) / 89.10	(89.34) / 89.02	(89.68) / 88.80	(89.63) / 89.05

The best model achieves a final test score of 89.08%:

1. Strict, Combined Evaluation (official):
Accuracy:  99.26%;
Precision:  89.01%;
Recall:  89.16%;
FB1:  89.08

Scripts for training and evaluation are also available.

🔧 Technical Details

The model uses the SpanMarker architecture and fine - tunes on the GELECTRA Large model. The hyper - parameters are set according to the "German's Next Language Model" paper. The evaluation is based on seqeval and the official GermEval 2014 Evaluation Script.

📄 License

This project is licensed under the MIT license.

📊 Model Information

Property	Details
Model Type	SpanMarker
Base Model	deepset/gelectra-large
Training Data	gwlms/germeval2014
Metrics	f1, recall, precision
Model Index	SpanMarker with GELECTRA Large on GermEval 2014 NER Dataset by Stefan Schweter (@stefan-it)
Results	F1: 0.8908, Precision: 0.8901, Recall: 0.8916

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご