upos-english-fast Open Source Model - Freely Achieve Fast and Accurate Universal English Part-of-Speech Tagging

Home

Upos English Fast

Developed by flair

Flair's built-in fast English universal POS tagging model, trained on the Ontonotes dataset with an F1 score of 98.47

Sequence Labeling

PyTorch

English#English POS Tagging #Fast Inference #High Accuracy UPOS

Downloads 3,677

Release Time : 3/2/2022

Model Overview

This is a sequence labeling model for English universal POS tagging, capable of accurately identifying the part-of-speech categories of words in text, such as nouns, verbs, adjectives, etc.

Model Features

High Accuracy

Achieves an F1 score of 98.47 on the Ontonotes dataset, demonstrating excellent performance

Fast Inference

As a fast version model, it optimizes inference speed while maintaining high accuracy

Comprehensive Tag Coverage

Supports 17 universal POS tags, covering various parts of speech in English

Model Capabilities

English POS Tagging

Sequence Labeling

Text Analysis

Use Cases

Natural Language Processing

Text Preprocessing

Provides POS tagging preprocessing for downstream NLP tasks

Improves the accuracy of subsequent tasks such as named entity recognition and syntactic analysis

Language Learning Tool

Builds English learning aids to automatically analyze sentence structures

Helps learners understand English grammatical structures

🚀 English Universal Part-of-Speech Tagging in Flair (fast model)

This project offers a fast universal part - of - speech tagging model for English, which is integrated with Flair. It effectively addresses the need for accurate part - of - speech tagging in English text, providing high - quality tagging results.

✨ Features

High Accuracy: Achieves an F1 - Score of 98,47 on the Ontonotes dataset.
Comprehensive Tagging: Predicts a wide range of universal POS tags, including adjectives, adpositions, adverbs, and more.
Based on Advanced Techniques: Built upon Flair embeddings and LSTM - CRF.

Property	Details
Model Type	Fast universal part - of - speech tagging model for English
Training Data	Ontonotes

tag	meaning
ADJ	adjective
ADP	adposition
ADV	adverb
AUX	auxiliary
CCONJ	coordinating conjunction
DET	determiner
INTJ	interjection
NOUN	noun
NUM	numeral
PART	particle
PRON	pronoun
PROPN	proper noun
PUNCT	punctuation
SCONJ	subordinating conjunction
SYM	symbol
VERB	verb
X	other

📦 Installation

Requires: Flair (pip install flair)

💻 Usage Examples

Basic Usage

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/upos-english-fast")

# make example sentence
sentence = Sentence("I love Berlin.")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('pos'):
    print(entity)

This yields the following output:

Span [1]: "I"   [− Labels: PRON (0.9996)]
Span [2]: "love"   [− Labels: VERB (1.0)]
Span [3]: "Berlin"   [− Labels: PROPN (0.9986)]
Span [4]: "."   [− Labels: PUNCT (1.0)]

So, the word "I" is labeled as a pronoun (PRON), "love" is labeled as a verb (VERB) and "Berlin" is labeled as a proper noun (PROPN) in the sentence "I love Berlin".

📚 Documentation

Training: Script to train this model

The following Flair script was used to train this model:

from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
corpus: Corpus = ColumnCorpus(
                "resources/tasks/onto-ner",
                column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
                tag_to_bioes="ner",
            )

# 2. what tag do we want to predict?
tag_type = 'upos'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # contextual string embeddings, forward
    FlairEmbeddings('news-forward-fast'),

    # contextual string embeddings, backward
    FlairEmbeddings('news-backward-fast'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

# 7. run training
trainer.train('resources/taggers/upos-english-fast',
              train_with_dev=True,
              max_epochs=150)

📄 License

Please cite the following paper when using this model.

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}

💡 Usage Tip

The Flair issue tracker is available here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご