🚀 葡萄牙語詞性標註器
本項目針對詞性標註任務,使用 MacMorpho 語料庫對 BERTimbau 模型進行了微調,訓練 10 個輪次後,整體 F1 分數達到了 0.9826。
🚀 快速開始
本項目通過對BERTimbau模型微調,實現了高性能的葡萄牙語詞性標註。以下是相關的評估指標和參數設置。
📚 詳細文檔
評估指標
Precision Recall F1 Suport
accuracy 0.98 33729
macro avg 0.96 0.95 0.95 33729
weighted avg 0.98 0.98 0.98 33729
F1: 0.9826 Accuracy: 0.9826
參數設置
nclasses = 27
nepochs = 30
batch_size = 32
batch_status = 32
learning_rate = 1e-5
early_stop = 3
max_length = 200
詞性標籤說明
標籤 |
含義 |
ADJ |
形容詞 |
ADV |
副詞 |
ADV-KS |
從屬連接副詞 |
ADV-KS-REL |
從屬關係副詞 |
ART |
冠詞 |
CUR |
貨幣 |
IN |
感嘆詞 |
KC |
並列連詞 |
KS |
從屬連詞 |
N |
名詞 |
NPROP |
專有名詞 |
NUM |
數字 |
PCP |
分詞 |
PDEN |
指示詞 |
PREP |
介詞 |
PROADJ |
形容詞性代詞 |
PRO-KS |
從屬連接代詞 |
PRO-KS-REL |
從屬連接關係代詞 |
PROPESS |
人稱代詞 |
PROSUB |
名詞性代詞 |
V |
動詞 |
VAUX |
助動詞 |
📖 引用方式
@article{
Schneider_postagger_2023,
place={Brasil},
title={Developing a Transformer-based Clinical Part-of-Speech Tagger for Brazilian Portuguese},
volume={15},
url={https://jhi.sbis.org.br/index.php/jhi-sbis/article/view/1086},
DOI={10.59681/2175-4411.v15.iEspecial.2023.1086},
abstractNote={<p>Electronic Health Records are a valuable source of information to be extracted by means of natural language processing (NLP) tasks, such as morphosyntactic word tagging. Although there have been significant advances in health NLP, such as the Transformer architecture, languages such as Portuguese are still underrepresented. This paper presents taggers developed for Portuguese texts, fine-tuned using BioBERtpt (clinical/biomedical) and BERTimbau (generic) models on a POS-tagged corpus. We achieved an accuracy of 0.9826, state-of-the-art for the corpus used. In addition, we performed a human-based evaluation of the trained models and others in the literature, using authentic clinical narratives. Our clinical model achieved 0.8145 in accuracy compared to 0.7656 for the generic model. It also showed competitive results compared to models trained specifically with clinical texts, evidencing domain impact on the base model in NLP tasks.</p>},
number={Especial}, journal={Journal of Health Informatics},
author={Schneider, Elisa Terumi Rubel and Gumiel, Yohan Bonescki and Oliveira, Lucas Ferro Antunes de and Montenegro, Carolina de Oliveira and Barzotto, Laura Rubel and Moro, Claudia and Pagano, Adriana and Paraiso, Emerson Cabrera},
year={2023},
month={jul.} }
❓ 常見問題
如果您有任何疑問,請在 NLP Portuguese POS-Tagger 項目中提交 GitHub issue。