Pos-English-Fast Open-Source English Part-of-Speech Tagging Model - Free Deployment for Precise English Part-of-Speech Tagging

Home

Pos English Fast

Developed by flair

Flair's built-in fast English POS tagging model, trained on Ontonotes dataset with an F1 score of 98.10.

Sequence Labeling

PyTorch

English#Fine-grained POS tagging #High-precision F1 #Ontonotes adaptation

Downloads 3,760

Release Time : 3/2/2022

Model Overview

This model is used for POS tagging tasks in English text, capable of predicting fine-grained POS tags.

Model Features

High accuracy

Achieves an F1 score of 98.10 on the Ontonotes dataset.

Fine-grained tagging

Supports over 40 types of fine-grained POS tags.

Fast inference

As a fast model, it optimizes inference speed while maintaining high accuracy.

Based on Flair embeddings

Utilizes Flair's contextual string embeddings to capture word context information.

Model Capabilities

English POS tagging

Fine-grained POS recognition

Text sequence labeling

Use Cases

Natural Language Processing

Text preprocessing

Provides POS tagging for downstream NLP tasks (such as named entity recognition and syntactic parsing).

Improves performance of downstream tasks

Language learning tools

Used for grammar analysis features in English learning tools.

Helps learners understand sentence structure

🚀 English Part-of-Speech Tagging in Flair (fast model)

This is a fast part-of-speech tagging model for English that comes with Flair. It can effectively predict fine - grained POS tags, providing high - quality language processing capabilities.

✨ Features

High Accuracy: Achieves an F1 - Score of 98,10 on the Ontonotes dataset.
Fine - Grained Tagging: Predicts a wide range of fine - grained POS tags.

Property	Details
Model Type	Part - of - Speech Tagging Model
Training Data	Ontonotes

tag	meaning
ADD	Email
AFX	Affix
CC	Coordinating conjunction
CD	Cardinal number
DT	Determiner
EX	Existential there
FW	Foreign word
HYPH	Hyphen
IN	Preposition or subordinating conjunction
JJ	Adjective
JJR	Adjective, comparative
JJS	Adjective, superlative
LS	List item marker
MD	Modal
NFP	Superfluous punctuation
NN	Noun, singular or mass
NNP	Proper noun, singular
NNPS	Proper noun, plural
NNS	Noun, plural
PDT	Predeterminer
POS	Possessive ending
PRP	Personal pronoun
PRP$	Possessive pronoun
RB	Adverb
RBR	Adverb, comparative
RBS	Adverb, superlative
RP	Particle
SYM	Symbol
TO	to
UH	Interjection
VB	Verb, base form
VBD	Verb, past tense
VBG	Verb, gerund or present participle
VBN	Verb, past participle
VBP	Verb, non - 3rd person singular present
VBZ	Verb, 3rd person singular present
WDT	Wh - determiner
WP	Wh - pronoun
WP$	Possessive wh - pronoun
WRB	Wh - adverb
XX	Unknown

This model is based on Flair embeddings and LSTM - CRF.

🚀 Quick Start

📦 Installation

Requires: Flair (pip install flair)

💻 Usage Examples

Basic Usage

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/pos-english-fast")

# make example sentence
sentence = Sentence("I love Berlin.")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('pos'):
    print(entity)

This code will yield the following output:

Span [1]: "I"   [− Labels: PRP (1.0)]
Span [2]: "love"   [− Labels: VBP (0.9998)]
Span [3]: "Berlin"   [− Labels: NNP (0.9999)]
Span [4]: "."   [− Labels: . (0.9998)]

So, in the sentence "I love Berlin", the word "I" is labeled as a pronoun (PRP), "love" is labeled as a verb (VBP), and "Berlin" is labeled as a proper noun (NNP).

🔧 Technical Details

Training: Script to train this model

The following Flair script was used to train this model:

from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
corpus: Corpus = ColumnCorpus(
                "resources/tasks/onto-ner",
                column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
                tag_to_bioes="ner",
            )

# 2. what tag do we want to predict?
tag_type = 'pos'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # contextual string embeddings, forward
    FlairEmbeddings('news-forward'),

    # contextual string embeddings, backward
    FlairEmbeddings('news-backward'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

# 7. run training
trainer.train('resources/taggers/pos-english-fast',
              train_with_dev=True,
              max_epochs=150)

📄 License

Please cite the following paper when using this model.

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}

⚠️ Important Note

The Flair issue tracker is available here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご