ner-english-ontonotes-fast Open Source Model - Quickly Identify 18 Types of English Named Entities Based on Ontonotes Dataset

Home

Ner English Ontonotes Fast

Developed by flair

Flair's built-in fast model for 18-class English named entity recognition, trained on the Ontonotes dataset

Sequence Labeling

PyTorch

English#18-class entity recognition #Fast inference #English text processing

Downloads 23.94k

Release Time : 3/2/2022

Model Overview

This model is used for named entity recognition in English text and can identify 18 different types of named entities, such as person names, locations, dates, etc.

Model Features

18-class entity recognition

Capable of identifying 18 different types of named entities, including persons, locations, dates, currencies, etc.

High performance

Achieves an F1 score of 89.3 on the Ontonotes dataset.

Fast inference

Optimized model version providing faster inference speed.

Flair word embeddings

Incorporates Flair's unique contextual word embedding technology to improve recognition accuracy.

Model Capabilities

Text entity recognition

Multi-category entity labeling

Sequence labeling

Use Cases

Information extraction

News text analysis

Extract key information such as person names, locations, and organizations from news articles

Accurately identifies various named entities in the text

Financial document processing

Identify monetary amounts, dates, and other information in financial documents

Extracts key financial data

Knowledge graph construction

Entity relation extraction

Serves as a preliminary processing step for knowledge graph construction

Provides entity annotations for subsequent relation extraction

🚀 English NER in Flair (Ontonotes fast model)

This project offers a fast 18 - class NER model for English, which is part of Flair. It provides high - performance named - entity recognition for English text.

F1 - Score: 89.3 (Ontonotes)

✨ Features

Tag Prediction

This model predicts 18 tags:

Property	Details
CARDINAL	cardinal value
DATE	date value
EVENT	event name
FAC	building name
GPE	geo - political entity
LANGUAGE	language name
LAW	law name
LOC	location name
MONEY	money name
NORP	affiliation
ORDINAL	ordinal value
ORG	organization name
PERCENT	percent value
PERSON	person name
PRODUCT	product name
QUANTITY	quantity value
TIME	time value
WORK_OF_ART	name of work of art

Technical Foundation

It is based on Flair embeddings and LSTM - CRF.

🚀 Quick Start

📦 Installation

Requires: Flair (pip install flair)

💻 Usage Examples

Basic Usage

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes-fast")

# make example sentence
sentence = Sentence("On September 1st George Washington won 1 dollar.")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

This code will yield the following output:

Span [2,3]: "September 1st"   [− Labels: DATE (0.9655)]
Span [4,5]: "George Washington"   [− Labels: PERSON (0.8243)]
Span [7,8]: "1 dollar"   [− Labels: MONEY (0.8022)]

So, the entities "September 1st" (labeled as a date), "George Washington" (labeled as a person) and "1 dollar" (labeled as a money) are found in the sentence "On September 1st George Washington won 1 dollar".

🔧 Technical Details

Training Script

The following Flair script was used to train this model:

from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
corpus: Corpus = ColumnCorpus(
                "resources/tasks/onto-ner",
                column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
                tag_to_bioes="ner",
            )

# 2. what tag do we want to predict?
tag_type = 'ner'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # GloVe embeddings
    WordEmbeddings('en-crawl'),

    # contextual string embeddings, forward
    FlairEmbeddings('news-forward-fast'),

    # contextual string embeddings, backward
    FlairEmbeddings('news-backward-fast'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

# 7. run training
trainer.train('resources/taggers/ner-english-ontonotes-fast',
              train_with_dev=True,
              max_epochs=150)

📄 License

Cite

Please cite the following paper when using this model.

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}

Issues?

The Flair issue tracker is available here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご