Pl Core News Lg
CPU-optimized Polish natural language processing model supporting POS tagging, named entity recognition, dependency parsing and other tasks
Downloads 58
Release Time : 3/2/2022
Model Overview
A large Polish processing pipeline based on the spaCy framework, including tokenization, POS tagging, morphological analysis, dependency parsing, named entity recognition, optimized for CPU usage.
Model Features
CPU optimization
Processing pipeline specifically optimized for CPU usage
Comprehensive language analysis
Supports comprehensive Polish language analysis including morphological features and complex grammatical structures
High-quality vectors
Contains 500,000 unique vectors (300 dimensions) providing good semantic representation
Multi-task processing
Single model can simultaneously handle multiple tasks like POS tagging, named entity recognition, and dependency parsing
Model Capabilities
POS tagging
Named entity recognition
Dependency parsing
Lemmatization
Morphological analysis
Sentence segmentation
Use Cases
Text processing
Polish document analysis
Grammatical analysis and structural parsing of Polish texts
Accurate identification of POS tags, grammatical relations and named entities
Information extraction
Extract structured information from Polish texts
Named entity recognition accuracy with F1 score of 0.841
Linguistic research
Polish morphological analysis
Analyze complex Polish morphological variations
Morphological feature accuracy rate of 90.98%
🚀 pl_core_news_lg
The pl_core_news_lg
is a Polish language processing pipeline optimized for CPU, which can perform tasks such as named - entity recognition, part - of - speech tagging, and morphological analysis.
📚 Documentation
Details: https://spacy.io/models/pl#pl_core_news_lg
This is a Polish pipeline optimized for CPU. Its components include tok2vec, morphologizer, parser, lemmatizer (trainable_lemmatizer), tagger, senter, and ner.
Property | Details |
---|---|
Model Name | pl_core_news_lg |
Version | 3.7.0 |
spaCy | >=3.7.0,<3.8.0 |
Default Pipeline | tok2vec , morphologizer , parser , lemmatizer , tagger , attribute_ruler , ner |
Components | tok2vec , morphologizer , parser , lemmatizer , tagger , senter , attribute_ruler , ner |
Vectors | 500000 keys, 500000 unique vectors (300 dimensions) |
Sources | UD Polish PDB v2.8 (Wróblewska, Alina; Zeman, Daniel; Mašek, Jan; Rosa, Rudolf) National Corpus of Polish (Mirosław Bańko, Rafał L. Górski, Barbara Lewandowska-Tomaszczyk, Marek Łaziński, Piotr Pęzik, Adam Przepiórkowski) Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia) (Explosion) |
License | GNU GPL 3.0 |
Author | Explosion |
Model Index
The following table shows the performance metrics of the pl_core_news_lg
model on different token - classification tasks:
Task | Metric Name | Metric Type | Value |
---|---|---|---|
NER | NER Precision | precision | 0.847446671 |
NER | NER Recall | recall | 0.8355640535 |
NER | NER F Score | f_score | 0.8414634146 |
TAG | TAG (XPOS) Accuracy | accuracy | 0.9828973843 |
POS | POS (UPOS) Accuracy | accuracy | 0.9781017658 |
MORPH | Morph (UFeats) Accuracy | accuracy | 0.9098299967 |
LEMMA | Lemma Accuracy | accuracy | 0.9424670256 |
UNLABELED_DEPENDENCIES | Unlabeled Attachment Score (UAS) | f_score | 0.894969847 |
LABELED_DEPENDENCIES | Labeled Attachment Score (LAS) | f_score | 0.8237918475 |
SENTS | Sentences F - Score | f_score | 0.9631305135 |
Label Scheme
View label scheme (1726 labels for 4 components)
Component | Labels |
---|---|
morphologizer |
Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|POS=NOUN , AdpType=Prep|POS=ADP|Variant=Short , Case=Loc|Gender=Fem|Number=Sing|POS=NOUN , Animacy=Inan|Case=Ins|Gender=Masc|Number=Sing|POS=NOUN , Aspect=Imp|Mood=Ind|Number=Sing|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Loc|Gender=Fem|Number=Plur|POS=NOUN , Case=Acc|Gender=Fem|Number=Sing|POS=NOUN , POS=CCONJ , Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing|POS=NOUN , POS=PUNCT|PunctType=Peri , Case=Nom|Gender=Fem|Number=Sing|POS=NOUN , Case=Loc|Degree=Pos|Gender=Fem|Number=Sing|POS=ADJ , POS=PUNCT|PunctType=Comm , Animacy=Inan|Case=Loc|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing|POS=NOUN , Case=Ins|Degree=Pos|Gender=Fem|Number=Sing|POS=ADJ , Case=Ins|Gender=Fem|Number=Sing|POS=NOUN , AdpType=Prep|POS=ADP , Case=Loc|Gender=Neut|Number=Sing|POS=NOUN , Animacy=Inan|Case=Ins|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Case=Gen|Degree=Pos|Gender=Fem|Number=Sing|POS=ADJ , Case=Gen|Gender=Fem|Number=Sing|POS=NOUN , Case=Loc|Gender=Fem|Number=Sing|POS=DET|PronType=Rel , Animacy=Hum|Case=Nom|Gender=Masc|NumForm=Word|Number=Plur|POS=NUM , Animacy=Hum|Case=Gen|Gender=Masc|Number=Plur|POS=NOUN , Aspect=Imp|POS=VERB|Tense=Pres|VerbForm=Conv|Voice=Act , Animacy=Hum|Case=Ins|Gender=Masc|Number=Sing|POS=PRON|Person=3|PrepCase=Pre|PronType=Prs|Variant=Long , Animacy=Nhum|Case=Nom|Gender=Masc|Number=Sing|POS=NOUN , Animacy=Inan|Case=Loc|Gender=Masc|Number=Plur|POS=NOUN , POS=ADV , Animacy=Hum|Case=Ins|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Animacy=Hum|Case=Ins|Gender=Masc|Number=Sing|POS=NOUN , Aspect=Imp|Mood=Ind|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing|POS=NOUN , Animacy=Inan|Case=Gen|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Aspect=Imp|Case=Gen|Gender=Fem|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , POS=PRON|PronType=Prs|Reflex=Yes , Case=Ins|Gender=Neut|NumType=Sets|Number=Sing|POS=NOUN , Case=Nom|Gender=Fem|NumForm=Word|Number=Plur|POS=NUM , Case=Nom|Gender=Fem|Number=Plur|POS=NOUN , Case=Gen|Gender=Fem|Number=Plur|POS=NOUN , Animacy=Nhum|Case=Nom|Gender=Masc|NumForm=Word|Number=Plur|POS=NUM , Animacy=Nhum|Case=Nom|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Animacy=Nhum|Case=Nom|Gender=Masc|Number=Plur|POS=NOUN , Case=Gen|POS=PRON|PronType=Prs|Reflex=Yes , Animacy=Inan|Case=Acc|Gender=Masc|Number=Plur|POS=NOUN , Case=Ins|Gender=Fem|Number=Plur|POS=NOUN , Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|POS=NOUN , Aspect=Perf|Case=Ins|Gender=Fem|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Inan|Aspect=Perf|Case=Acc|Gender=Masc|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Hum|Case=Nom|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Animacy=Hum|Case=Gen|Gender=Masc|Number=Plur|POS=PRON|Person=3|PrepCase=Pre|PronType=Prs|Variant=Long , Case=Ins|Degree=Pos|Gender=Fem|Number=Plur|POS=ADJ , Animacy=Hum|Aspect=Perf|Case=Nom|Gender=Masc|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Case=Gen|Gender=Neut|Number=Sing|POS=NOUN , Case=Nom|Gender=Fem|NumType=Card|Number=Plur|POS=DET|PronType=Ind , Animacy=Inan|Case=Loc|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Aspect=Imp|Case=Gen|Gender=Fem|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Case=Gen|Gender=Neut|Number=Plur|POS=NOUN , Animacy=Nhum|Case=Nom|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Case=Gen|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ , Animacy=Inan|Case=Acc|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Animacy=Nhum|Case=Gen|Gender=Masc|Number=Sing|POS=NOUN , Animacy=Inan|Case=Ins|Gender=Masc|Number=Plur|POS=NOUN , Case=Ins|Gender=Neut|Number=Sing|POS=NOUN , Case=Loc|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ , Case=Nom|Degree=Pos|Gender=Fem|Number=Sing|POS=ADJ , Case=Loc|Degree=Pos|Gender=Neut|Number=Plur|POS=ADJ , Case=Loc|Gender=Neut|Number=Ptan|POS=NOUN , AdpType=Prep|POS=ADP|Variant=Long , Animacy=Inan|Case=Nom|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Animacy=Inan|Case=Nom|Gender=Masc|Number=Plur|POS=NOUN , Case=Ins|Degree=Pos|Gender=Neut|Number=Plur|POS=ADJ , Case=Ins|Gender=Neut|Number=Plur|POS=NOUN , Animacy=Hum|Case=Nom|Gender=Masc|Number=Plur|POS=NOUN , Animacy=Inan|Aspect=Perf|Case=Gen|Gender=Masc|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Inan|Case=Gen|Gender=Masc|Number=Plur|POS=NOUN , Case=Nom|Degree=Pos|Gender=Fem|Number=Plur|POS=ADJ , Animacy=Hum|Aspect=Imp|Case=Ins|Gender=Masc|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Animacy=Nhum|Case=Loc|Gender=Masc|Number=Sing|POS=PRON|Person=3|PrepCase=Pre|PronType=Prs|Variant=Long , Case=Ins|Gender=Neut|NumType=Sets|Number=Plur|POS=NOUN , Case=Acc|Gender=Neut|NumType=Sets|Number=Sing|POS=NOUN , Aspect=Imp|Case=Acc|Gender=Neut|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Aspect=Imp|Case=Loc|Gender=Fem|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Case=Acc|Gender=Fem|Number=Sing|POS=PRON|Person=3|PrepCase=Pre|PronType=Prs|Variant=Long , Case=Nom|Degree=Cmp|Gender=Fem|Number=Plur|POS=ADJ , Case=Loc|Gender=Neut|Number=Sing|POS=PRON|PronType=Dem , Aspect=Imp|Case=Nom|Gender=Fem|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Case=Loc|Gender=Neut|Number=Plur|POS=NOUN , Aspect=Perf|Case=Gen|Gender=Fem|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Inan|Case=Ins|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Aspect=Imp|Case=Nom|Gender=Fem|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing|POS=DET|PronType=Rel , Animacy=Nhum|Aspect=Perf|Case=Nom|Gender=Masc|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Inan|Case=Nom|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing|POS=NOUN , Case=Gen|Degree=Pos|Gender=Neut|Number=Plur|POS=ADJ , Case=Gen|Gender=Neut|NumType=Sets|Number=Plur|POS=NOUN , Aspect=Perf|Case=Gen|Gender=Neut|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , POS=SPACE , Case=Gen|Degree=Pos|Gender=Fem|Number=Plur|POS=ADJ , Aspect=Perf|Case=Nom|Gender=Fem|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Case=Acc|Degree=Pos|Gender=Fem|Number=Plur|POS=ADJ , Case=Acc|Gender=Fem|Number=Plur|POS=NOUN , Case=Gen|Gender=Fem|Number=Plur|POS=PRON|Person=3|PrepCase=Npr|PronType=Prs|Variant=Long , Case=Ins|Gender=Neut|Number=Ptan|POS=NOUN , Case=Ins|POS=PRON|PronType=Prs|Reflex=Yes , Case=Acc|Gender=Neut|Number=Sing|POS=NOUN , Animacy=Nhum|Case=Loc|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Animacy=Nhum|Case=Loc|Gender=Masc|Number=Plur|POS=NOUN , Case=Nom|Gender=Neut|NumForm=Word|NumType=Sets|Number=Plur|POS=NUM , Case=Ins|Gender=Fem|Number=Sing|POS=DET|PronType=Rel , POS=PART , Aspect=Perf|Gender=Fem|Mood=Ind|Number=Sing|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act , Animacy=Inan|Case=Nom|Gender=Masc|NumForm=Word|Number=Plur|POS=NUM , Degree=Pos|POS=ADV , Case=Nom|Gender=Neut|Number=Sing|POS=NOUN , Animacy=Hum|Case=Gen|Gender=Masc|Number=Sing|POS=NOUN , Aspect=Imp|Case=Gen|Gender=Fem|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , POS=SCONJ , Case=Dat|Degree=Pos|Gender=Fem|Number=Sing|POS=ADJ , Case=Dat|Gender=Fem|Number=Sing|POS=NOUN , Animacy=Inan|Case=Gen|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Case=Nom|Gender=Neut|NumType=Sets|Number=Sing|POS=NOUN , Animacy=Hum|Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Case=Acc|Gender=Neut|Number=Plur|POS=NOUN , Animacy=Inan|Case=Acc|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Aspect=Imp|Case=Loc|Gender=Fem|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Aspect=Perf|Case=Ins|Gender=Neut|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Nhum|Case=Gen|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Animacy=Nhum|Aspect=Imp|Case=Gen|Gender=Masc|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Case=Loc|Gender=Neut|NumType=Sets|Number=Plur|POS=NOUN , Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing|POS=DET|PronType=Rel , Animacy=Hum|Case=Gen|Gender=Masc|Number=Sing|POS=PRON|Person=3|PrepCase=Pre|PronType=Prs|Variant=Long , Aspect=Perf|Case=Nom|Gender=Fem|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Inan|Aspect=Imp|Case=Gen|Gender=Masc|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Animacy=Nhum|Case=Acc|Gender=Masc|Number=Sing|POS=NOUN , Animacy=Hum|Case=Gen|Degree=Pos|Gender=Masc|Number=Sing|POS=ADJ , Animacy=Hum|Case=Gen|Degree=Pos|Gender=Masc|Number=Plur|POS=ADJ , Hyph=Yes|POS=ADJ , POS=PUNCT|PunctType=Dash , Animacy=Inan|Aspect=Perf|Case=Nom|Gender=Masc|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Aspect=Perf|Case=Ins|Gender=Neut|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Aspect=Perf|Case=Gen|Gender=Fem|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Pass , Animacy=Nhum|Aspect=Imp|Case=Acc|Gender=Masc|Number=Plur|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Animacy=Nhum|Case=Acc|Gender=Masc|Number=Plur|POS=NOUN , Case=Loc|Degree=Pos|Gender=Fem|Number=Plur|POS=ADJ , Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing|POS=DET|PronType=Dem , Case=Acc|Degree=Pos|Gender=Fem|Number=Sing|POS=ADJ , Aspect=Imp|Case=Ins|Gender=Fem|Number=Sing|POS=ADJ|Polarity=Pos|VerbForm=Part|Voice=Act , Animacy |
📄 License
This project is licensed under the GNU GPL 3.0
license.
Indonesian Roberta Base Posp Tagger
MIT
This is a POS tagging model fine-tuned based on the Indonesian RoBERTa model, trained on the indonlu dataset for Indonesian text POS tagging tasks.
Sequence Labeling
Transformers Other

I
w11wo
2.2M
7
Bert Base NER
MIT
BERT fine-tuned named entity recognition model capable of identifying four entity types: Location (LOC), Organization (ORG), Person (PER), and Miscellaneous (MISC)
Sequence Labeling English
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
This model is a sequence labeling model fine-tuned on RoBERTa, designed to identify and remove Protected Health Information (PHI/PII) from medical records.
Sequence Labeling
Transformers Supports Multiple Languages

D
obi
1.1M
33
Ner English Fast
Flair's built-in fast English 4-class named entity recognition model, based on Flair embeddings and LSTM-CRF architecture, achieving an F1 score of 92.92 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
978.01k
24
French Camembert Postag Model
French POS tagging model based on Camembert-base, trained using the free-french-treebank dataset
Sequence Labeling
Transformers French

F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
A Spanish named entity recognition model fine-tuned based on the XLM-Roberta-large architecture, with excellent performance on the CoNLL-2002 dataset.
Sequence Labeling
Transformers Spanish

X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
Named entity recognition model fine-tuned on Indonesian NER tasks based on NusaBert-v1.3
Sequence Labeling
Transformers Other

N
cahya
759.09k
3
Ner English Large
Flair framework's built-in large English NER model for 4 entity types, utilizing document-level XLM-R embeddings and FLERT technique, achieving an F1 score of 94.36 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
749.04k
44
Punctuate All
MIT
A multilingual punctuation prediction model fine-tuned based on xlm-roberta-base, supporting automatic punctuation completion for 12 European languages
Sequence Labeling
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
Japanese named entity recognition model fine-tuned based on xlm-roberta-base
Sequence Labeling
Transformers Supports Multiple Languages

X
tsmatz
630.71k
25
Featured Recommended AI Models