postagger-portuguese Open Source Portuguese Part-of-Speech Tagger - High Precision Empowers Portuguese Text Processing

Home

Postagger Portuguese

Developed by lisaterumi

A Portuguese POS tagger fine-tuned from BERTimbau model, trained on MacMorpho corpus with an F1 score of 0.9826.

Sequence Labeling

Transformers

Other#BERTimbau fine-tuning #Portuguese NLP #High-accuracy POS tagging

Downloads 373

Release Time : 7/17/2022

Model Overview

This model is a POS tagger for Brazilian Portuguese, fine-tuned from BERTimbau model on MacMorpho corpus, primarily used for text POS tagging tasks.

Model Features

High accuracy

Achieves an F1 score of 0.9826 on MacMorpho corpus, demonstrating excellent performance.

Based on BERTimbau

Utilizes the Portuguese pre-trained BERTimbau model as foundation, with strong language understanding capabilities.

Extensive tag coverage

Supports 27 POS tags, covering adjectives, adverbs, nouns, verbs, and more.

Model Capabilities

Portuguese POS tagging

Text analysis

Natural language processing

Use Cases

Text processing

Clinical text analysis

Used for analyzing clinical narrative texts to extract key information.

Clinical model achieves an accuracy of 0.8145.

General text tagging

Used for POS tagging of general Portuguese texts.

General model achieves an accuracy of 0.7656.

🚀 POS-Tagger Portuguese

This project fine - tuned the BERTimbau model with the MacMorpho corpus for the Post - Tagger task. After 10 epochs of training, it achieved a general F1 - Score of 0.9826.

📊 Metrics

              Precision  Recall  F1    Suport
accuracy                         0.98  33729
macro avg     0.96       0.95    0.95  33729
weighted avg  0.98       0.98    0.98  33729

F1:  0.9826 Accuracy:  0.9826

⚙️ Parameters

nclasses = 27
nepochs = 30
batch_size = 32
batch_status = 32
learning_rate = 1e-5
early_stop = 3
max_length = 200

🏷️ Tags

Tag	Meaning
ADJ	Adjective
ADV	Adverb
ADV - KS	Subordinate conjunctive adverb
ADV - KS - REL	Subordinate relative adverb
ART	Article
CUR	Currency
IN	Interjection
KC	Coordinating conjunction
KS	Subordinating conjunction
N	Noun
NPROP	Proper noun
NUM	Number
PCP	Participle
PDEN	Denotative word
PREP	Preposition
PROADJ	Adjective pronoun
PRO - KS	Subordinate conjunctive pronoun
PRO - KS - REL	Subordinate relative connective pronoun
PROPESS	Personal pronoun
PROSUB	Nominal pronoun
V	Verb
VAUX	Auxiliary verb

📚 Documentation

Widget

The widget contains the following texts:

"Tinha uma pedra no meio do caminho."
"Vamos tomar um café quentinho?"
"Como você se chama?"

Datasets

The model was trained on the MacMorpho dataset.

📝 How to Cite

@article{
Schneider_postagger_2023,
place={Brasil},
title={Developing a Transformer-based Clinical Part-of-Speech Tagger for Brazilian Portuguese},
volume={15},
url={https://jhi.sbis.org.br/index.php/jhi-sbis/article/view/1086},
DOI={10.59681/2175-4411.v15.iEspecial.2023.1086},
abstractNote={&amp;lt;p&amp;gt;Electronic Health Records are a valuable source of information to be extracted by means of natural language processing (NLP) tasks, such as morphosyntactic word tagging. Although there have been significant advances in health NLP, such as the Transformer architecture, languages such as Portuguese are still underrepresented. This paper presents taggers developed for Portuguese texts, fine-tuned using BioBERtpt (clinical/biomedical) and BERTimbau (generic) models on a POS-tagged corpus. We achieved an accuracy of 0.9826, state-of-the-art for the corpus used. In addition, we performed a human-based evaluation of the trained models and others in the literature, using authentic clinical narratives. Our clinical model achieved 0.8145 in accuracy compared to 0.7656 for the generic model. It also showed competitive results compared to models trained specifically with clinical texts, evidencing domain impact on the base model in NLP tasks.&amp;lt;/p&amp;gt;},
number={Especial}, journal={Journal of Health Informatics},
author={Schneider, Elisa Terumi Rubel and Gumiel, Yohan Bonescki and Oliveira, Lucas Ferro Antunes de and Montenegro, Carolina de Oliveira and Barzotto, Laura Rubel and Moro, Claudia and Pagano, Adriana and Paraiso, Emerson Cabrera},
year={2023},
month={jul.} }

❓ Questions?

Please, post a Github issue on the NLP Portuguese POS-Tagger.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご