clinicalnerpt - healthcare開源模型 - 免費支持13種臨床實體識別的葡萄牙語工具

首頁

Clinicalnerpt Healthcare

由pucpr開發

基於BioBERTpt訓練的葡萄牙語臨床命名實體識別模型，支持13種與UMLS兼容的臨床實體識別

序列標註

Transformers

其他#葡萄牙語臨床NER #UMLS兼容實體 #巴西臨床語料

下載量 50

發布時間 : 3/2/2022

模型概述

該模型是BioBERTpt項目的一部分，專門用於從葡萄牙語臨床文本中識別醫學實體，如疾病、症狀等。

模型特點

UMLS兼容實體識別

支持識別13種與統一醫學語言系統(UMLS)兼容的臨床實體

巴西臨床語料庫訓練

基於巴西臨床語料庫SemClinBr訓練，針對葡萄牙語臨床文本優化

遷移學習應用

採用BioBERTpt預訓練模型進行遷移學習，減少標註數據需求

模型能力

臨床文本實體識別

醫療術語提取

葡萄牙語NLP處理

使用案例

臨床文檔處理

糖尿病隨訪記錄分析

從糖尿病患者的隨訪記錄中提取關鍵醫療實體

識別疾病、治療等相關實體

腎功能報告處理

分析腎功能改變相關的臨床報告

提取腎功能相關指標和診斷信息

醫療信息提取

電子健康記錄分析

從電子健康記錄中結構化提取醫療信息

自動識別患者狀況和治療方案

🚀 葡萄牙語臨床命名實體識別 - 醫療保健

該醫療保健命名實體識別（NER）模型是 BioBERTpt 項目的一部分，該項目訓練了 13 個臨床實體模型（與統一醫學語言系統 UMLS 兼容）。所有來自 “pucpr” 用戶的 NER 模型均基於巴西臨床語料庫 SemClinBr 進行訓練，訓練輪次為 10 次，採用 IOB2 格式，基礎模型為 BioBERTpt(all)。

🚀 快速開始

該醫療保健 NER 模型是 BioBERTpt 項目的一部分，此項目訓練了 13 個臨床實體模型（與 UMLS 兼容）。所有來自 “pucpr” 用戶的 NER 模型都從巴西臨床語料庫 SemClinBr 進行訓練，訓練 10 個輪次，採用 IOB2 格式，基於 BioBERTpt(all) 模型。

👏 致謝

本研究部分由巴西高等教育人員素質提升協調局（CAPES）資助 - 資助代碼 001。

📚 詳細文檔

引用信息

@inproceedings{schneider-etal-2020-biobertpt,
    title = "{B}io{BERT}pt - A {P}ortuguese Neural Language Model for Clinical Named Entity Recognition",
    author = "Schneider, Elisa Terumi Rubel  and
      de Souza, Jo{\~a}o Vitor Andrioli  and
      Knafou, Julien  and
      Oliveira, Lucas Emanuel Silva e  and
      Copara, Jenny  and
      Gumiel, Yohan Bonescki  and
      Oliveira, Lucas Ferro Antunes de  and
      Paraiso, Emerson Cabrera  and
      Teodoro, Douglas  and
      Barra, Cl{\'a}udia Maria Cabral Moro",
    booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.7",
    pages = "65--72",
    abstract = "With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72{\%}, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.",
}