clinicalnerpt-medical開源模型 - 免費支持13種臨床實體識別的葡萄牙語工具

首頁

Clinicalnerpt Medical

由pucpr開發

基於BioBERTpt的葡萄牙語臨床命名實體識別模型，支持13種與UMLS兼容的臨床實體識別

序列標註

Transformers

其他#葡萄牙語臨床NER #UMLS兼容實體 #BioBERTpt遷移學習

下載量 55

發布時間 : 3/2/2022

模型概述

該模型專門用於處理葡萄牙語臨床文本中的命名實體識別任務，基於巴西臨床語料庫SemClinBr訓練，適用於醫療領域的自然語言處理應用。

模型特點

UMLS兼容實體識別

支持13種與統一醫學語言系統(UMLS)兼容的臨床實體識別

領域專用訓練

基於巴西臨床語料庫SemClinBr專門訓練，針對醫療領域優化

遷移學習優勢

通過降低標註數據需求和避免全模型重訓練，增強了葡萄牙語生物醫學NER模型

模型能力

臨床文本實體識別

醫療術語提取

葡萄牙語臨床文檔處理

使用案例

醫療文檔處理

電子健康記錄分析

從葡萄牙語電子健康記錄中提取關鍵醫療實體信息

提升醫療信息檢索和結構化效率

臨床研究支持

輔助研究人員從臨床敘述中提取結構化數據

加速臨床數據分析和研究進程

🚀 葡萄牙語臨床命名實體識別 - 醫學領域

該醫學命名實體識別（NER）模型是BioBERTpt項目的一部分，該項目訓練了13種臨床實體模型（與UMLS兼容），能夠從非結構化的臨床文本中解鎖有價值的信息。

🚀 快速開始

醫學命名實體識別（NER）模型是BioBERTpt項目的一部分。在該項目中，訓練了13種臨床實體模型（與UMLS兼容）。所有來自“pucpr”用戶的NER模型均基於巴西臨床語料庫SemClinBr進行訓練，以BioBERTpt（全量）模型為基礎，訓練10個輪次，採用IOB2格式。

🔗 示例文本

今日進行了心房和心室電極的mp - cdi評估。
患者在下午時段被送往高壓氧艙。

📚 詳細文檔

致謝

本研究部分由巴西高等教育人員發展協調局（CAPES）資助 - 資助代碼001。

引用

@inproceedings{schneider-etal-2020-biobertpt,
    title = "{B}io{BERT}pt - A {P}ortuguese Neural Language Model for Clinical Named Entity Recognition",
    author = "Schneider, Elisa Terumi Rubel  and
      de Souza, Jo{\~a}o Vitor Andrioli  and
      Knafou, Julien  and
      Oliveira, Lucas Emanuel Silva e  and
      Copara, Jenny  and
      Gumiel, Yohan Bonescki  and
      Oliveira, Lucas Ferro Antunes de  and
      Paraiso, Emerson Cabrera  and
      Teodoro, Douglas  and
      Barra, Cl{\'a}udia Maria Cabral Moro",
    booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.7",
    pages = "65--72",
    abstract = "With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72{\%}, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.",
}