Hu Core News Trf
Transformer pipeline (huBERT) for Hungarian HuSpaCy, containing multiple natural language processing components
Downloads 118
Release Time : 4/1/2022
Model Overview
Hungarian natural language processing model based on huBERT, providing complete NLP functionalities including POS tagging, named entity recognition, dependency parsing, etc.
Model Features
Multi-task processing capability
Single model integrates multiple NLP tasks including POS tagging, named entity recognition, dependency parsing
High-precision performance
Achieves F1 score of 0.917 in NER tasks and POS tagging accuracy of 0.982
Transformer-based architecture
Utilizes huBERT base model for enhanced contextual understanding
Comprehensive morphological analysis
Supports complex Hungarian morphological feature analysis with 0.966 accuracy
Model Capabilities
POS tagging
Named entity recognition
Dependency parsing
Lemmatization
Sentence segmentation
Morphological analysis
Use Cases
Text analysis
Hungarian text processing
Performs grammatical analysis and structural parsing of Hungarian texts
Accurately identifies POS, entities, and syntactic relationships
Information extraction
Hungarian entity recognition
Extracts named entities like person names and locations from Hungarian texts
NER F1 score reaches 0.917
🚀 HuSpaCy Hungarian Transformer Pipeline
This project provides a Hungarian transformer pipeline (huBERT) for HuSpaCy. It includes components such as transformer, senter, tagger, morphologizer, lemmatizer, parser, and ner, which can be used for various natural language processing tasks in Hungarian.
✨ Features
- Multiple NLP Tasks: Supports token - classification tasks like NER, TAG, POS, MORPH, LEMMA, UNLABELED_DEPENDENCIES, LABELED_DEPENDENCIES, and SENTS.
- Rich Components: Comes with a comprehensive set of components for different aspects of language processing.
- Large Label Scheme: Offers a detailed label scheme for accurate classification.
📚 Documentation
Model Information
Property | Details |
---|---|
Model Type | hu_core_news_trf |
Version | 3.7.0 |
spaCy Compatibility | >=3.7.0,<3.8.0 |
Default Pipeline | transformer , senter , tagger , morphologizer , lookup_lemmatizer , trainable_lemmatizer , experimental_arc_predicter , experimental_arc_labeler , ner |
Components | transformer , senter , tagger , morphologizer , lookup_lemmatizer , trainable_lemmatizer , experimental_arc_predicter , experimental_arc_labeler , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | UD Hungarian Szeged (Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze (MTA - SZTE Research Group on Artificial Intelligence)) NYTK - NerKor Corpus (Eszter Simon, Noémi Vadász (Department of Language Technology and Applied Linguistics)) Szeged NER Corpus (György Szarvas, Richárd Farkas, László Felföldi, András Kocsor, János Csirik (MTA - SZTE Research Group on Artificial Intelligence)) huBERT base model (cased) (Dávid Márk Nemeskey (SZTAKI - HLT)) |
License | cc - by - sa - 4.0 |
Author | SzegedAI, MILAB |
Model Performance
Task | Metric | Value |
---|---|---|
NER | NER Precision | 0.9119332986 |
NER | NER Recall | 0.9229957806 |
NER | NER F Score | 0.9174311927 |
TAG | TAG (XPOS) Accuracy | 0.9823906594 |
POS | POS (UPOS) Accuracy | 0.9820078476 |
MORPH | Morph (UFeats) Accuracy | 0.9658340511 |
LEMMA | Lemma Accuracy | 0.9861257296 |
UNLABELED_DEPENDENCIES | Unlabeled Attachment Score (UAS) | 0.9000861326 |
LABELED_DEPENDENCIES | Labeled Attachment Score (LAS) | 0.8568421053 |
SENTS | Sentences F - Score | 0.9899888765 |
Label Scheme
View label scheme (1217 labels for 4 components)
Component | Labels |
---|---|
tagger |
ADJ , ADP , ADV , AUX , CCONJ , DET , INTJ , NOUN , NUM , PART , PRON , PROPN , PUNCT , SCONJ , SYM , VERB , X |
morphologizer |
Definite=Def|POS=DET|PronType=Art , Case=Ine|Number=Sing|POS=NOUN , POS=ADV , Case=Nom|NumType=Card|Number=Sing|POS=NUM , Case=Nom|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Nom|Number=Sing|POS=ADJ|VerbForm=PartPres , Case=Nom|Degree=Pos|Number=Sing|POS=ADJ , Case=Nom|Number=Sing|POS=NOUN , Definite=Ind|POS=DET|PronType=Tot , Case=Ade|Number=Sing|POS=NOUN , Case=Nom|Degree=Cmp|Number=Sing|POS=ADJ , POS=PUNCT , Case=Nom|Number=Sing|POS=DET|Person=3|PronType=Dem , Case=Acc|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Definite=Ind|POS=DET|PronType=Ind , Definite=Def|Mood=Ind|Number=Sing|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , POS=ADP , POS=CCONJ , Case=Del|Number=Sing|POS=NOUN , Case=Gen|Number=Sing|POS=PRON|Person=3|PronType=Dem , Case=Sbl|Number=Sing|POS=NOUN , Case=Nom|Number=Sing|POS=ADJ|VerbForm=PartPast , Case=Del|Number=Plur|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Nom|Number=Sing|POS=PROPN , Definite=Ind|Mood=Ind|Number=Sing|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Act , Case=Acc|Number=Sing|POS=NOUN , Case=Sup|Number=Sing|POS=PROPN , Case=Ess|Degree=Pos|Number=Sing|POS=ADJ , Case=Ine|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Sup|Number=Plur|POS=NOUN , Degree=Pos|POS=ADV , Case=Sup|Number=Sing|POS=NOUN , Definite=Ind|Mood=Ind|Number=Sing|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Cau|Number=Plur|POS=NOUN , Case=Cau|Number=Sing|POS=NOUN , Case=Gen|Number=Sing|POS=NOUN , Definite=Ind|Mood=Ind|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Nom|Number=Plur|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Definite=Def|Mood=Ind|Number=Sing|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Act , Case=Tra|Number=Sing|POS=ADJ|VerbForm=PartPres , Case=Nom|Number=Plur|POS=NOUN , Case=Cau|Number=Sing|POS=PRON|Person=3|PronType=Prs , Definite=Def|Mood=Pot|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Ins|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Ins|Number=Sing|POS=NOUN , POS=ADV|PronType=Neg , Case=Ine|Number=Plur|Number[psor]=Plur|POS=NOUN|Person[psor]=1 , Case=Gen|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , POS=SCONJ , Case=Acc|Number=Plur|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Definite=Def|Mood=Ind|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Nom|NumType=Frac|Number=Sing|POS=NUM , Case=Sbl|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Abl|Number=Sing|POS=NOUN , Case=Dat|Number=Sing|POS=NOUN , Definite=Ind|Mood=Ind|Number=Sing|POS=AUX|Person=3|Tense=Pres|Voice=Act , POS=VERB|VerbForm=Inf|Voice=Act , Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Dem , Definite=Ind|Mood=Ind|Number=Plur|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Act , Case=Acc|Number=Sing|POS=PRON|Person=3|PronType=Dem , Case=Nom|Degree=Sup|Number=Sing|POS=ADJ , POS=ADV|PronType=Dem , Case=Ins|Number=Plur|Number[psor]=Sing|POS=NOUN|Person[psor]=1 , Case=Ins|Number=Sing|POS=PRON|Person=3|PronType=Dem , Case=Ade|Degree=Pos|Number=Sing|POS=ADJ , POS=ADV|PronType=Int , Case=Tra|Degree=Pos|Number=Sing|POS=ADJ , Definite=Ind|Mood=Pot|Number=Sing|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Sbl|Number=Sing|POS=PROPN , Case=Sbl|Number=Sing|Number[psor]=Plur|POS=NOUN|Person[psor]=1 , Case=All|Number=Sing|POS=PRON|Person=3|PronType=Dem , Definite=Ind|Mood=Imp|Number=Sing|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , POS=PART , Case=Sup|Number=Sing|POS=DET|Person=3|PronType=Dem , POS=ADV|PronType=Tot , Case=Ill|Definite=Ind|POS=DET|PronType=Ind , Number=Sing|POS=VERB|Person=3|VerbForm=Inf|Voice=Act , Case=Ill|Number=Sing|POS=NOUN , Definite=Ind|Mood=Ind|Number=Sing|POS=AUX|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Sbl|Number=Sing|POS=PRON|Person=3|PronType=Rel , Case=Dat|Number=Sing|POS=PRON|Person=3|PronType=Dem , Case=Nom|NumType=Ord|Number=Sing|POS=ADJ , Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Rel , Definite=Def|Mood=Ind|Number=Sing|POS=AUX|Person=3|Tense=Pres|Voice=Act , Definite=Ind|Mood=Cnd|Number=Sing|POS=AUX|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Acc|Number=Sing|POS=DET|Person=3|PronType=Dem , Definite=Def|Mood=Ind|Number=Plur|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Act , Case=Sup|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Definite=Ind|Mood=Ind|Number=Sing|POS=AUX|Person=3|Tense=Past|VerbForm=Fin|Voice=Act , Case=Ade|Number=Sing|POS=ADJ|VerbForm=PartPast , Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Dem , Case=Ess|Number=Sing|POS=ADJ|VerbForm=PartPres , Case=Acc|Number=Sing|POS=PROPN , Case=Nom|Number=Sing|POS=ADJ|VerbForm=PartFut , Case=Ine|NumType=Card|Number=Sing|POS=NUM , Definite=Ind|Mood=Pot|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Acc|Number=Plur|POS=NOUN , Case=Del|Number=Plur|POS=NOUN , Case=Gen|Number=Plur|POS=PRON|Person=3|PronType=Rel , Case=Nom|Number=Plur|Number[psor]=Plur|POS=NOUN|Person[psor]=3 , Case=Tra|Number=Sing|POS=NOUN , Case=Sup|Number=Sing|POS=PRON|Person=3|PronType=Rel , Definite=Ind|Mood=Imp|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Definite=Def|Mood=Imp|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Acc|Number=Sing|Number[psor]=Plur|POS=NOUN|Person[psor]=3 , Case=Acc|Number=Plur|Number[psor]=Plur|POS=NOUN|Person[psor]=3 , Definite=Ind|POS=DET|PronType=Art , Case=Dat|Number=Plur|POS=NOUN , Case=Ins|Number=Plur|POS=NOUN , Case=Sbl|Number=Plur|POS=NOUN , Case=Ela|Number=Sing|POS=NOUN , Definite=Ind|Mood=Pot|Number=Plur|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Act , Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs , Case=All|Number=Sing|POS=NOUN , Case=Ine|Number=Plur|POS=NOUN , Case=Dat|Number=Plur|POS=ADJ|VerbForm=PartPres , Case=Ela|Number=Sing|Number[psor]=Plur|POS=NOUN|Person[psor]=3 , Case=Abl|Number=Sing|POS=PROPN , Case=Cau|Number=Plur|POS=PRON|Person=3|PronType=Prs , Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs|Reflex=Yes , Case=Ins|Number=Sing|POS=PROPN , Case=Ess|Number=Sing|POS=ADJ|VerbForm=PartPast , Number=Plur|POS=VERB|Person=3|VerbForm=Inf|Voice=Act , Case=Sbl|Number=Sing|POS=PRON|Person=3|PronType=Prs , Case=Nom|Number=Sing|Number[psor]=Plur|POS=NOUN|Person[psor]=3 , Case=All|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Abl|Number=Plur|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Definite=Def|Mood=Ind|Number=Plur|POS=VERB|Person=1|Tense=Past|VerbForm=Fin|Voice=Act , Case=Dat|Degree=Pos|Number=Plur|POS=ADJ , POS=ADV|PronType=Rel , Definite=Def|Mood=Ind|Number=Plur|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Cau , Case=Del|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Gen|Number=Sing|POS=DET|Person=3|PronType=Dem , Case=Ill|Number=Plur|POS=NOUN , Case=Ela|Number=Plur|POS=NOUN , Case=Ill|Number=Sing|POS=PROPN , Case=Ela|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Prs , Case=Acc|Number=Sing|POS=PRON|Person=3|PronType=Int , Definite=Def|POS=DET|PronType=Ind , Case=Dat|Number=Sing|POS=PRON|Person=3|PronType=Ind , Definite=Ind|Mood=Ind|Number=Plur|POS=VERB|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Nom|Number=Plur|POS=PRON|Person=1|PronType=Prs , Case=Acc|Number=Sing|POS=PRON|Person=3|PronType=Rcp , Case=Ine|Number=Sing|POS=PRON|Person=3|PronType=Prs , Case=All|Number=Sing|POS=PRON|Person=3|PronType=Prs , Case=Ter|Number=Sing|Number[psor]=Plur|POS=NOUN|Person[psor]=3 , POS=ADV|VerbForm=Conv , Definite=Def|Mood=Ind|Number=Plur|POS=VERB|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Sup|Degree=Pos|Number=Sing|POS=ADJ , Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Tot , Aspect=Iter|Definite=Def|Mood=Ind|Number=Plur|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Aspect=Iter|Definite=Def|Mood=Ind|Number=Plur|POS=VERB|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act , Definite=Ind|Mood=Pot|Number=Sing|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Act , Definite=Def|Mood=Imp|Number=Sing|POS=VERB|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act , Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Ind , Case=Dis|Number=Sing|POS=NOUN , Case=Gen|Number=Sing|POS=PRON|Person=3|PronType=Rel , Case=Ade|Number=Plur|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Dat|Number=Sing|POS=PRON|Person=3|PronType=Prs|Reflex=Yes , Case=All|Number=Plur|POS=PRON|Person=3|PronType=Prs , Case=Dat|Number=Plur|POS=ADJ|VerbForm=PartPast , Case=Dat|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Nom|Number=Plur|POS=PROPN , Case=Nom|Degree=Pos|Number=Plur|POS=ADJ , Case=Cau|Number=Sing|Number[psor]=Sing|POS=NOUN|Person[psor]=3 , Case=Dat|Degree=Pos|Number=Sing|POS=ADJ , Case=Ine|Number=Sing|POS=PROPN , Definite=Def|Mood=Ind|Number=Sing|POS=VERB|Person=3|Tense=Past|VerbForm=Fin|Voice=Cau , Case=Acc|Number=Sing|Number[psor]=Plur|POS=NOUN|Person[psor]=1 , `Definite= |
📄 License
This project is licensed under the cc - by - sa - 4.0
license.
Indonesian Roberta Base Posp Tagger
MIT
This is a POS tagging model fine-tuned based on the Indonesian RoBERTa model, trained on the indonlu dataset for Indonesian text POS tagging tasks.
Sequence Labeling
Transformers Other

I
w11wo
2.2M
7
Bert Base NER
MIT
BERT fine-tuned named entity recognition model capable of identifying four entity types: Location (LOC), Organization (ORG), Person (PER), and Miscellaneous (MISC)
Sequence Labeling English
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
This model is a sequence labeling model fine-tuned on RoBERTa, designed to identify and remove Protected Health Information (PHI/PII) from medical records.
Sequence Labeling
Transformers Supports Multiple Languages

D
obi
1.1M
33
Ner English Fast
Flair's built-in fast English 4-class named entity recognition model, based on Flair embeddings and LSTM-CRF architecture, achieving an F1 score of 92.92 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
978.01k
24
French Camembert Postag Model
French POS tagging model based on Camembert-base, trained using the free-french-treebank dataset
Sequence Labeling
Transformers French

F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
A Spanish named entity recognition model fine-tuned based on the XLM-Roberta-large architecture, with excellent performance on the CoNLL-2002 dataset.
Sequence Labeling
Transformers Spanish

X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
Named entity recognition model fine-tuned on Indonesian NER tasks based on NusaBert-v1.3
Sequence Labeling
Transformers Other

N
cahya
759.09k
3
Ner English Large
Flair framework's built-in large English NER model for 4 entity types, utilizing document-level XLM-R embeddings and FLERT technique, achieving an F1 score of 94.36 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
749.04k
44
Punctuate All
MIT
A multilingual punctuation prediction model fine-tuned based on xlm-roberta-base, supporting automatic punctuation completion for 12 European languages
Sequence Labeling
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
Japanese named entity recognition model fine-tuned based on xlm-roberta-base
Sequence Labeling
Transformers Supports Multiple Languages

X
tsmatz
630.71k
25
Featured Recommended AI Models