clinicalnerpt - medicalオープンソースモデル - 13種類の臨床エンティティ認識を無料でサポートするポルトガル語ツール

ホーム

Clinicalnerpt Medical

pucprによって開発

BioBERTptベースのポルトガル語臨床固有表現認識モデル、UMLS互換の13種類の臨床エンティティ認識をサポート

シーケンスラベリング

Transformers

その他#ポルトガル語臨床NER #UMLS互換エンティティ #BioBERTpt転移学習

ダウンロード数 55

リリース時間 : 3/2/2022

モデル概要

このモデルはポルトガル語臨床テキストにおける固有表現認識タスク専用に設計され、ブラジル臨床コーパスSemClinBrでトレーニングされ、医療分野の自然言語処理アプリケーションに適しています。

モデル特徴

UMLS互換エンティティ認識

統一医療言語システム(UMLS)互換の13種類の臨床エンティティ認識をサポート

ドメイン特化トレーニング

ブラジル臨床コーパスSemClinBrで専門的にトレーニングされ、医療分野に最適化

転移学習の利点

アノテーションデータ要件の低減とフルモデル再トレーニングの回避により、ポルトガル語生物医学NERモデルを強化

モデル能力

臨床テキストエンティティ認識

医療用語抽出

ポルトガル語臨床ドキュメント処理

使用事例

医療ドキュメント処理

電子健康記録分析

ポルトガル語電子健康記録から主要な医療エンティティ情報を抽出

医療情報検索と構造化効率の向上

臨床研究支援

研究者が臨床記述から構造化データを抽出するのを支援

臨床データ分析と研究プロセスの加速

🚀 ポルトガル語臨床NER - 医療分野

この医療用NERモデルは、BioBERTptプロジェクトの一部です。このプロジェクトでは、13種類の臨床エンティティ（UMLSと互換性があります）のモデルが学習されています。"pucpr"ユーザーによるすべてのNERモデルは、ブラジルの臨床コーパスSemClinBrから、BioBERTpt(all)モデルを元に10エポックでIOB2形式で学習されています。

📚 詳細ドキュメント

ウィジェット

今日、心房と心室の電極を使用してmp - cdiの評価を実施しました。
患者は午後に高圧酸素治療室に送られました。

データセット

SemClinBr

サムネイル

🎉 謝辞

この研究は、Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES)によって一部資金提供されました（財政コード001）。

📄 引用

@inproceedings{schneider-etal-2020-biobertpt,
    title = "{B}io{BERT}pt - A {P}ortuguese Neural Language Model for Clinical Named Entity Recognition",
    author = "Schneider, Elisa Terumi Rubel  and
      de Souza, Jo{\~a}o Vitor Andrioli  and
      Knafou, Julien  and
      Oliveira, Lucas Emanuel Silva e  and
      Copara, Jenny  and
      Gumiel, Yohan Bonescki  and
      Oliveira, Lucas Ferro Antunes de  and
      Paraiso, Emerson Cabrera  and
      Teodoro, Douglas  and
      Barra, Cl{\'a}udia Maria Cabral Moro",
    booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.7",
    pages = "65--72",
    abstract = "With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72{\%}, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.",
}