đ English NER in Flair (Ontonotes large model)
This is a large 18 - class NER model for English, which comes with Flair. It can effectively identify various named entities in English texts.
F1 - Score: 90.93 (Ontonotes)
This model predicts 18 tags:
tag |
meaning |
CARDINAL |
cardinal value |
DATE |
date value |
EVENT |
event name |
FAC |
building name |
GPE |
geo - political entity |
LANGUAGE |
language name |
LAW |
law name |
LOC |
location name |
MONEY |
money name |
NORP |
affiliation |
ORDINAL |
ordinal value |
ORG |
organization name |
PERCENT |
percent value |
PERSON |
person name |
PRODUCT |
product name |
QUANTITY |
quantity value |
TIME |
time value |
WORK_OF_ART |
name of work of art |
This model is based on document - level XLM - R embeddings and FLERT.
đ Quick Start
⨠Features
- This is a large - scale 18 - class English NER model.
- It achieves an F1 - Score of 90.93 on the Ontonotes dataset.
- Based on document - level XLM - R embeddings and FLERT technology.
đĻ Installation
Requires: Flair (pip install flair
)
đģ Usage Examples
Basic Usage
from flair.data import Sentence
from flair.models import SequenceTagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
sentence = Sentence("On September 1st George won 1 dollar while watching Game of Thrones.")
tagger.predict(sentence)
print(sentence)
print('The following NER tags are found:')
for entity in sentence.get_spans('ner'):
print(entity)
This yields the following output:
Span [2,3]: "September 1st" [â Labels: DATE (1.0)]
Span [4]: "George" [â Labels: PERSON (1.0)]
Span [6,7]: "1 dollar" [â Labels: MONEY (1.0)]
Span [10,11,12]: "Game of Thrones" [â Labels: WORK_OF_ART (1.0)]
So, the entities "September 1st" (labeled as a date), "George" (labeled as a person), "1 dollar" (labeled as a money) and "Game of Thrones" (labeled as a work of art) are found in the sentence "On September 1st George Washington won 1 dollar while watching Game of Thrones".
đ§ Technical Details
The following Flair script was used to train this model:
from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
corpus: Corpus = ColumnCorpus(
"resources/tasks/onto-ner",
column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
tag_to_bioes="ner",
)
tag_type = 'ner'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
from flair.embeddings import TransformerWordEmbeddings
embeddings = TransformerWordEmbeddings(
model='xlm-roberta-large',
layers="-1",
subtoken_pooling="first",
fine_tune=True,
use_context=True,
)
from flair.models import SequenceTagger
tagger = SequenceTagger(
hidden_size=256,
embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type='ner',
use_crf=False,
use_rnn=False,
reproject_embeddings=False,
)
from flair.trainers import ModelTrainer
trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.AdamW)
from torch.optim.lr_scheduler import OneCycleLR
trainer.train('resources/taggers/ner-english-ontonotes-large',
learning_rate=5.0e-6,
mini_batch_size=4,
mini_batch_chunk_size=1,
max_epochs=20,
scheduler=OneCycleLR,
embeddings_storage_mode='none',
weight_decay=0.,
)
đ License
Please cite the following paper when using this model.
@misc{schweter2020flert,
title={FLERT: Document-Level Features for Named Entity Recognition},
author={Stefan Schweter and Alan Akbik},
year={2020},
eprint={2011.06993},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
đĄ Usage Tip
â ī¸ Important Note
The Ontonotes dataset does not come with Flair. You need to download and reformat it into a column format yourself.
đĄ Usage Tip
If you encounter any issues during use, you can check the Flair issue tracker here.