ner-german-legal: An Open-source German Legal Named Entity Recognition Model - Supports Recognition of 19 Types of Legal Entities

Ner German Legal

Developed by flair

A German legal named entity recognition model included in the Flair framework, specifically designed for legal texts, supporting 19 types of legal entity recognition

Sequence Labeling

PyTorch

German#German Legal NER #High-precision Entity Recognition #Legal Text Parsing

Downloads 22.32k

Release Time : 3/2/2022

Model Overview

This model, based on the Flair framework, is specifically designed for named entity recognition tasks in German legal texts, capable of identifying 19 types of legal-related entities such as lawyers, legal provisions, and courts.

Model Features

High-precision Recognition

Achieved an F1 score of 96.35 on the LER German dataset, demonstrating excellent performance

Comprehensive Legal Entity Coverage

Supports recognition of 19 types of legal-related entities, including legal provisions, courts, lawyers, and other professional categories

Context Awareness

Combines Flair word embeddings and LSTM-CRF architecture to understand contextual information

Model Capabilities

German Legal Text Processing

Named Entity Recognition

Legal Entity Classification

Use Cases

Legal Document Processing

Legal Provision Identification

Identify and annotate legal provision references from legal texts

Successfully identified legal provisions such as '§ 36 Abs. 7 IfSG'

Legal Person Identification

Identify names of individuals mentioned in legal texts

Accurately identified names such as 'Herr W.'

Legal Information Extraction

Legal Entity Extraction

Extract entity information such as institutions, courts, and companies from legal documents

Can extract 19 types of legal-related entities

🚀 NER for German Legal Text in Flair (default model)

This model is a legal NER model for German, which is included in Flair. It aims to accurately identify named entities in German legal texts, providing valuable support for legal information extraction.

F1-Score: 96.35 (LER German dataset)

It predicts 19 tags as follows:

tag	meaning
AN	Lawyer
EUN	European Norm
GS	Law
GRT	Court
INN	Institution
LD	State
LDS	Landscape
LIT	Literature
MRK	Brand
ORG	Organization
PER	Person
RR	Judge
RS	Jurisprudence
ST	City
STR	Street
UN	Company
VO	Regulation
VS	Provision
VT	Contract

This model is based on Flair embeddings and LSTM - CRF.

For more details on the Legal NER dataset, please refer here.

🚀 Quick Start

✨ Features

This is a pre - trained legal NER model for German in Flair.
It can predict 19 different tags related to legal entities.
Achieves a high F1 - Score of 96.35 on the LER German dataset.

📦 Installation

Requires: Flair (pip install flair)

💻 Usage Examples

Basic Usage

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/ner-german-legal")

# make example sentence (don't use tokenizer since Rechtstexte are badly handled)
sentence = Sentence("Herr W. verstieß gegen § 36 Abs. 7 IfSG.", use_tokenizer=False)

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

This code will output:

Span [2]: "W."   [− Labels: PER (0.9911)]
Span [5,6,7,8,9]: "§ 36 Abs. 7 IfSG."   [− Labels: GS (0.5353)]

🔧 Technical Details

The model is trained using the following Flair script:

from flair.data import Corpus
from flair.datasets import LER_GERMAN
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. get the corpus
corpus: Corpus = LER_GERMAN()

# 2. what tag do we want to predict?
tag_type = 'ner'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # GloVe embeddings
    WordEmbeddings('de'),

    # contextual string embeddings, forward
    FlairEmbeddings('de-forward'),

    # contextual string embeddings, backward
    FlairEmbeddings('de-backward'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

# 7. run training
trainer.train('resources/taggers/ner-german-legal',
              train_with_dev=True,
              max_epochs=150)

📄 License

No license information is provided in the original document.

📚 Documentation

Cite

Please cite the following papers when using this model.

@inproceedings{leitner2019fine,
  author = {Elena Leitner and Georg Rehm and Julian Moreno-Schneider},
  title = {{Fine-grained Named Entity Recognition in Legal Documents}},
  booktitle = {Semantic Systems. The Power of AI and Knowledge
                  Graphs. Proceedings of the 15th International Conference
                  (SEMANTiCS 2019)},
  year = 2019,
  pages = {272--287},
  pdf = {https://link.springer.com/content/pdf/10.1007%2F978-3-030-33220-4_20.pdf}}

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}

Issues?

The Flair issue tracker is available here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご