NER-English-OntoNotes Open-Source Model - Free Deployment for 18 Categories of English Named Entity Recognition

Ner English Ontonotes

Developed by flair

Flair's built-in English 18-class named entity recognition model, trained on the Ontonotes dataset with an F1 score of 89.27.

Sequence Labeling

PyTorch

English#18-class entity recognition #High-precision NER #LSTM-CRF architecture

Downloads 175.71k

Release Time : 3/2/2022

Model Overview

This is a sequence labeling model based on the LSTM-CRF architecture, designed to recognize 18 types of named entities in English text, including persons, locations, dates, currencies, etc.

Model Features

18-class entity recognition

Capable of recognizing 18 different types of named entities, including persons, locations, dates, currencies, etc.

High precision

Achieves an F1 score of 89.27 on the Ontonotes dataset.

Hybrid word embeddings

Combines GloVe word embeddings with Flair's contextual string embeddings.

Model Capabilities

Text entity recognition

Multi-category entity classification

Sequence labeling

Use Cases

Information extraction

News entity extraction

Extract key information such as persons, locations, and organizations from news texts.

Can accurately identify various named entities in the text.

Financial document processing

Extract monetary amounts, dates, and other information from financial documents.

Can accurately identify currency values and date entities.

🚀 English NER in Flair (Ontonotes default model)

This is an 18 - class NER model for English provided by Flair, which can effectively identify various named entities in English text.

This is the 18 - class NER model for English that ships with Flair.

F1 - Score: 89.27 (Ontonotes)

Predicts 18 tags:

Property	Details
CARDINAL	cardinal value
DATE	date value
EVENT	event name
FAC	building name
GPE	geo - political entity
LANGUAGE	language name
LAW	law name
LOC	location name
MONEY	money name
NORP	affiliation
ORDINAL	ordinal value
ORG	organization name
PERCENT	percent value
PERSON	person name
PRODUCT	product name
QUANTITY	quantity value
TIME	time value
WORK_OF_ART	name of work of art

Based on [Flair embeddings](https://www.aclweb.org/anthology/C18 - 1139/) and LSTM - CRF.

🚀 Quick Start

Prerequisites

⚠️ Important Note

This project requires Flair to be installed. You can install it using the following command:

pip install flair

💻 Usage Examples

Basic Usage

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes")

# make example sentence
sentence = Sentence("On September 1st George Washington won 1 dollar.")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

This yields the following output:

Span [2,3]: "September 1st"   [− Labels: DATE (0.8824)]
Span [4,5]: "George Washington"   [− Labels: PERSON (0.9604)]
Span [7,8]: "1 dollar"   [− Labels: MONEY (0.9837)]

So, the entities "September 1st" (labeled as a date), "George Washington" (labeled as a person) and "1 dollar" (labeled as a money) are found in the sentence "On September 1st George Washington won 1 dollar".

Training

The following Flair script was used to train this model:

from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
corpus: Corpus = ColumnCorpus(
                "resources/tasks/onto-ner",
                column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
                tag_to_bioes="ner",
            )

# 2. what tag do we want to predict?
tag_type = 'ner'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # GloVe embeddings
    WordEmbeddings('en-crawl'),

    # contextual string embeddings, forward
    FlairEmbeddings('news-forward'),

    # contextual string embeddings, backward
    FlairEmbeddings('news-backward'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

# 7. run training
trainer.train('resources/taggers/ner-english-ontonotes',
              train_with_dev=True,
              max_epochs=150)

📚 Documentation

Cite

Please cite the following paper when using this model.

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}

Issues?

The Flair issue tracker is available here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご