Chunk-English Open-Source Model - Free Deployment for Accurately Identifying Nouns and Verb Phrase Structures in Sentences

Chunk English

Developed by flair

Flair's built-in standard English phrase chunking model for identifying grammatical structures such as noun phrases and verb phrases in sentences.

Sequence Labeling

PyTorch

English#High-precision chunking analysis #LSTM-CRF architecture #English text processing

Downloads 1,186

Release Time : 3/2/2022

Model Overview

This model is based on Flair word embeddings and LSTM-CRF architecture, specifically designed for chunking analysis in English text, capable of identifying grammatical structures such as noun phrases and verb phrases in sentences.

Model Features

High-precision chunking analysis

Achieves an F1 score of 96.48 on the CoNLL-2000 dataset, demonstrating excellent performance.

Multi-type phrase recognition

Can recognize 10 different types of phrase structures, including noun phrases, verb phrases, prepositional phrases, etc.

Contextual word embeddings

Uses Flair's unique context-sensitive word embeddings to better understand the meaning of words in sentences.

Model Capabilities

English text analysis

Grammatical structure recognition

Phrase boundary detection

Use Cases

Natural language processing

Text grammatical analysis

Analyze grammatical structures in sentences to identify components such as noun phrases and verb phrases.

Accurately identifies 'The happy man' as a noun phrase and 'has been eating' as a verb phrase.

Information extraction preprocessing

Used as a preprocessing step for information extraction systems to first identify key phrase structures in text.

🚀 English Chunking in Flair (default model)

This is a standard English phrase chunking model provided by Flair, which can effectively identify and classify different types of phrases in English text.

✨ Features

High Accuracy: Achieves an F1-Score of 96.48 on the CoNLL-2000 dataset.
Rich Tag Prediction: Predicts 4 tags, including ADJP, ADVP, CONJP, etc., covering a wide range of phrase types.
Advanced Technology: Based on Flair embeddings and LSTM - CRF technology.

Property	Details
Model Type	Phrase Chunking Model
Training Data	conll2000

tag	meaning
ADJP	adjectival
ADVP	adverbial
CONJP	conjunction
INTJ	interjection
LST	list marker
NP	noun phrase
PP	prepositional
PRT	particle
SBAR	subordinate clause
VP	verb phrase

📦 Installation

Requires: Flair (pip install flair)

💻 Usage Examples

Basic Usage

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/chunk-english")

# make example sentence
sentence = Sentence("The happy man has been eating at the diner")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('np'):
    print(entity)

This yields the following output:

Span [1,2,3]: "The happy man"   [− Labels: NP (0.9958)]
Span [4,5,6]: "has been eating"   [− Labels: VP (0.8759)]
Span [7]: "at"   [− Labels: PP (1.0)]
Span [8,9]: "the diner"   [− Labels: NP (0.9991)]

So, the spans "The happy man" and "the diner" are labeled as noun phrases (NP) and "has been eating" is labeled as a verb phrase (VP) in the sentence "The happy man has been eating at the diner".

🔧 Technical Details

The model is based on Flair embeddings and LSTM - CRF. The following Flair script was used to train this model:

from flair.data import Corpus
from flair.datasets import CONLL_2000
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. get the corpus
corpus: Corpus = CONLL_2000()

# 2. what tag do we want to predict?
tag_type = 'np'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # contextual string embeddings, forward
    FlairEmbeddings('news-forward'),

    # contextual string embeddings, backward
    FlairEmbeddings('news-backward'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

# 7. run training
trainer.train('resources/taggers/chunk-english',
              train_with_dev=True,
              max_epochs=150)

📄 License

No license information provided in the original document.

📚 Documentation

Cite

Please cite the following paper when using this model.

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}

Issues?

The Flair issue tracker is available here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご