🚀 BioBERTpt - 葡萄牙語臨床與生物醫學BERT
BioBERTpt是基於BERT的葡萄牙語臨床與生物醫學模型。它以BERT多語言大小寫敏感模型(BERT - Multilingual - Cased)為基礎進行初始化,並在臨床筆記和生物醫學文獻上進行訓練。本項目聚焦於BioBERTpt(clin)模型,這是BioBERTpt的臨床版本,在巴西醫院電子健康記錄中的臨床敘述數據上進行了訓練。
🚀 快速開始
如何使用該模型
可以通過transformers
庫來加載模型:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("pucpr/biobertpt-clin")
model = AutoModel.from_pretrained("pucpr/biobertpt-clin")
📚 詳細文檔
更多詳細信息和該模型在葡萄牙語命名實體識別(NER)任務中的性能表現,請參考原始論文 BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition。
👏 致謝
本研究部分由巴西高等教育人員素質提升協調局(Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil,簡稱CAPES)資助,資助代碼為001。
📄 許可證
引用
如果您使用了該模型,請按照以下格式進行引用:
@inproceedings{schneider-etal-2020-biobertpt,
title = "{B}io{BERT}pt - A {P}ortuguese Neural Language Model for Clinical Named Entity Recognition",
author = "Schneider, Elisa Terumi Rubel and
de Souza, Jo{\~a}o Vitor Andrioli and
Knafou, Julien and
Oliveira, Lucas Emanuel Silva e and
Copara, Jenny and
Gumiel, Yohan Bonescki and
Oliveira, Lucas Ferro Antunes de and
Paraiso, Emerson Cabrera and
Teodoro, Douglas and
Barra, Cl{\'a}udia Maria Cabral Moro",
booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.7",
pages = "65--72",
abstract = "With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72{\%}, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.",
}
❓ 有疑問?
如果您有任何問題,請在 BioBERTpt倉庫 上創建一個GitHub issue。