🚀 BioBERTpt - 葡萄牙語臨床與生物醫學BERT
BioBERTpt是一個用於葡萄牙語臨床命名實體識別的神經語言模型。它基於BERT架構,使用多語言BERT模型初始化,並在臨床筆記和生物醫學文獻上進行訓練,能夠有效處理葡萄牙語的臨床和生物醫學文本。
🚀 快速開始
模型加載
你可以通過transformers
庫加載BioBERTpt模型:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("pucpr/biobertpt-all")
model = AutoModel.from_pretrained("pucpr/biobertpt-all")
📚 詳細文檔
模型介紹
論文 BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition 中包含了基於BERT的葡萄牙語臨床和生物醫學模型。該模型使用多語言BERT模型(BERT-Multilingual-Cased)初始化,並在臨床筆記和生物醫學文獻上進行訓練。
本模型卡片描述的是BioBERTpt(all)模型,它是一個完整版本,包含葡萄牙語的臨床敘述和生物醫學文獻。
更多信息
若需瞭解更多關於BioBERTpt模型在葡萄牙語命名實體識別(NER)任務中的詳細信息和性能,請參考原始論文 BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition。
🙏 致謝
本研究部分由巴西高等教育人員素質提升協調局(Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil,CAPES)資助,資助代碼為001。
📄 許可證
引用信息
如果你使用了BioBERTpt模型,請按照以下格式進行引用:
@inproceedings{schneider-etal-2020-biobertpt,
title = "{B}io{BERT}pt - A {P}ortuguese Neural Language Model for Clinical Named Entity Recognition",
author = "Schneider, Elisa Terumi Rubel and
de Souza, Jo{\~a}o Vitor Andrioli and
Knafou, Julien and
Oliveira, Lucas Emanuel Silva e and
Copara, Jenny and
Gumiel, Yohan Bonescki and
Oliveira, Lucas Ferro Antunes de and
Paraiso, Emerson Cabrera and
Teodoro, Douglas and
Barra, Cl{\'a}udia Maria Cabral Moro",
booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.7",
pages = "65--72",
abstract = "With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72{\%}, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.",
}
❓ 常見問題
如果你有任何問題,可以在 BioBERTpt倉庫 上提交GitHub問題。