pos-english-fastオープンソース英語品詞タグ付けモデル - 無料でデプロイし英語の品詞を正確にタグ付けする

ホーム

Pos English Fast

flairによって開発

Flairに組み込まれた英語高速品詞タグ付けモデルで、Ontonotesデータセットで訓練され、F1スコアは98.10を達成。

シーケンスラベリング

PyTorch

英語#細粒度品詞タグ付け #高精度F1 #Ontonotes対応

ダウンロード数 3,760

リリース時間 : 3/2/2022

モデル概要

このモデルは英語テキストの品詞タグ付けタスクに使用され、細粒度の品詞タグを予測できます。

モデル特徴

高精度

Ontonotesデータセットで98.10のF1スコアを達成。

細粒度タグ付け

40種類以上の細粒度品詞タグをサポート。

高速推論

高速モデルとして、高精度を維持しながら推論速度を最適化。

Flair埋め込みベース

Flairの文脈文字列埋め込みを利用し、単語の文脈情報を捕捉。

モデル能力

英語品詞タグ付け

細粒度品詞認識

テキスト系列ラベリング

使用事例

自然言語処理

テキスト前処理

固有表現認識、構文解析などの下流NLPタスク向けに品詞タグ付けを提供。

下流タスクの性能向上

言語学習ツール

英語学習ツールの文法解析機能に使用。

学習者が文構造を理解するのを支援

🚀 Flairによる英語の品詞タグ付け (高速モデル)

これは、Flair に付属する英語用の高速品詞タグ付けモデルです。

F1スコア: 98,10 (Ontonotes)

細粒度の品詞タグを予測します:

タグ	意味
ADD	メールアドレス
AFX	接辞
CC	並立接続詞
CD	基数
DT	限定詞
EX	存在を表す "there"
FW	外来語
HYPH	ハイフン
IN	前置詞または従属接続詞
JJ	形容詞
JJR	形容詞の比較級
JJS	形容詞の最上級
LS	リスト項目マーカー
MD	助動詞
NFP	余分な句読点
NN	単数名詞または集合名詞
NNP	固有名詞（単数）
NNPS	固有名詞（複数）
NNS	複数名詞
PDT	前限定詞
POS	所有格の語尾
PRP	人称代名詞
PRP$	所有格代名詞
RB	副詞
RBR	副詞の比較級
RBS	副詞の最上級
RP	副詞的小品詞
SYM	記号
TO	"to"
UH	感嘆詞
VB	動詞の原形
VBD	動詞の過去形
VBG	動名詞または現在分詞
VBN	過去分詞
VBP	三人称単数以外の現在形
VBZ	三人称単数の現在形
WDT	疑問限定詞
WP	疑問代名詞
WP$	所有格の疑問代名詞
WRB	疑問副詞
XX	不明

このモデルは Flair埋め込みとLSTM-CRFに基づいています。

🚀 クイックスタート

💻 使用例

基本的な使用法

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/pos-english-fast")

# make example sentence
sentence = Sentence("I love Berlin.")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('pos'):
    print(entity)

このコードは次の出力を生成します:

Span [1]: "I"   [− Labels: PRP (1.0)]
Span [2]: "love"   [− Labels: VBP (0.9998)]
Span [3]: "Berlin"   [− Labels: NNP (0.9999)]
Span [4]: "."   [− Labels: . (0.9998)]

つまり、文 "I love Berlin" の中で、単語 "I" は代名詞 (PRP)、 "love" は動詞 (VBP)、 "Berlin" は固有名詞 (NNP) とラベル付けされます。

🔧 技術詳細

このモデルを訓練するスクリプト

このモデルを訓練するために次のFlairスクリプトが使用されました:

from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
corpus: Corpus = ColumnCorpus(
                "resources/tasks/onto-ner",
                column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
                tag_to_bioes="ner",
            )

# 2. what tag do we want to predict?
tag_type = 'pos'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # contextual string embeddings, forward
    FlairEmbeddings('news-forward'),

    # contextual string embeddings, backward
    FlairEmbeddings('news-backward'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

# 7. run training
trainer.train('resources/taggers/pos-english-fast',
              train_with_dev=True,
              max_epochs=150)

📄 ライセンス

このモデルを使用する際は、次の論文を引用してください。

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}