đ English Part-of-Speech Tagging in Flair (fast model)
This is a fast part-of-speech tagging model for English that comes with Flair. It can effectively predict fine - grained POS tags, providing high - quality language processing capabilities.
⨠Features
- High Accuracy: Achieves an F1 - Score of 98,10 on the Ontonotes dataset.
- Fine - Grained Tagging: Predicts a wide range of fine - grained POS tags.
Property |
Details |
Model Type |
Part - of - Speech Tagging Model |
Training Data |
Ontonotes |
tag |
meaning |
ADD |
Email |
AFX |
Affix |
CC |
Coordinating conjunction |
CD |
Cardinal number |
DT |
Determiner |
EX |
Existential there |
FW |
Foreign word |
HYPH |
Hyphen |
IN |
Preposition or subordinating conjunction |
JJ |
Adjective |
JJR |
Adjective, comparative |
JJS |
Adjective, superlative |
LS |
List item marker |
MD |
Modal |
NFP |
Superfluous punctuation |
NN |
Noun, singular or mass |
NNP |
Proper noun, singular |
NNPS |
Proper noun, plural |
NNS |
Noun, plural |
PDT |
Predeterminer |
POS |
Possessive ending |
PRP |
Personal pronoun |
PRP$ |
Possessive pronoun |
RB |
Adverb |
RBR |
Adverb, comparative |
RBS |
Adverb, superlative |
RP |
Particle |
SYM |
Symbol |
TO |
to |
UH |
Interjection |
VB |
Verb, base form |
VBD |
Verb, past tense |
VBG |
Verb, gerund or present participle |
VBN |
Verb, past participle |
VBP |
Verb, non - 3rd person singular present |
VBZ |
Verb, 3rd person singular present |
WDT |
Wh - determiner |
WP |
Wh - pronoun |
WP$ |
Possessive wh - pronoun |
WRB |
Wh - adverb |
XX |
Unknown |
This model is based on Flair embeddings and LSTM - CRF.
đ Quick Start
đĻ Installation
Requires: Flair (pip install flair
)
đģ Usage Examples
Basic Usage
from flair.data import Sentence
from flair.models import SequenceTagger
tagger = SequenceTagger.load("flair/pos-english-fast")
sentence = Sentence("I love Berlin.")
tagger.predict(sentence)
print(sentence)
print('The following NER tags are found:')
for entity in sentence.get_spans('pos'):
print(entity)
This code will yield the following output:
Span [1]: "I" [â Labels: PRP (1.0)]
Span [2]: "love" [â Labels: VBP (0.9998)]
Span [3]: "Berlin" [â Labels: NNP (0.9999)]
Span [4]: "." [â Labels: . (0.9998)]
So, in the sentence "I love Berlin", the word "I" is labeled as a pronoun (PRP), "love" is labeled as a verb (VBP), and "Berlin" is labeled as a proper noun (NNP).
đ§ Technical Details
Training: Script to train this model
The following Flair script was used to train this model:
from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
corpus: Corpus = ColumnCorpus(
"resources/tasks/onto-ner",
column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
tag_to_bioes="ner",
)
tag_type = 'pos'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
embedding_types = [
FlairEmbeddings('news-forward'),
FlairEmbeddings('news-backward'),
]
embeddings = StackedEmbeddings(embeddings=embedding_types)
from flair.models import SequenceTagger
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type=tag_type)
from flair.trainers import ModelTrainer
trainer = ModelTrainer(tagger, corpus)
trainer.train('resources/taggers/pos-english-fast',
train_with_dev=True,
max_epochs=150)
đ License
Please cite the following paper when using this model.
@inproceedings{akbik2018coling,
title={Contextual String Embeddings for Sequence Labeling},
author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
pages = {1638--1649},
year = {2018}
}
â ī¸ Important Note
The Flair issue tracker is available here.