Open-source clinicalnerpt-finding Model - A Portuguese Tool Supporting 13 Types of Clinical Entity Recognition

Clinicalnerpt Finding

Developed by pucpr

Portuguese clinical named entity recognition model based on BioBERTpt, supporting 13 UMLS-compatible clinical entity types

Sequence Labeling

Transformers

Other#Portuguese Clinical NER #UMLS-compatible Entities #BioBERTpt Transfer Learning

Downloads 49

Release Time : 3/2/2022

Model Overview

This model is part of the BioBERTpt project, specifically designed for named entity recognition in Portuguese clinical texts. Trained on the Brazilian clinical corpus SemClinBr, it optimizes entity recognition performance in clinical narratives.

Model Features

UMLS-compatible Entity Recognition

Supports recognition of 13 clinical entity types compatible with UMLS standards

Domain-optimized Model

BioBERTpt model specifically optimized for Portuguese clinical texts

Transfer Learning Application

Reduces annotation data requirements through transfer learning, enhancing Portuguese biomedical named entity recognition performance

Model Capabilities

Clinical Text Analysis

Named Entity Recognition

Portuguese Language Processing

Use Cases

Clinical Record Processing

Discharge Summary Analysis

Extract key clinical information from patient discharge records

Identify entities such as patient condition, follow-up plans

Treatment Plan Evaluation

Analyze treatment progress records for conditions like heart failure

Identify entities such as treatment plans, clinical progress

🚀 Portuguese Clinical NER - Finding

The Finding NER model is part of the BioBERTpt project, which aims to train models for clinical entity recognition in Portuguese.

This Finding NER model is a component of the BioBERTpt project. In this project, 13 models for clinical entities (compatible with UMLS) were trained. All NER models from the "pucpr" user were trained using the Brazilian clinical corpus SemClinBr. The training was carried out for 10 epochs in the IOB2 format, starting from the BioBERTpt(all) model.

Widget

"RECEBE ALTA EM BOM ESTADO GERAL, COM PLANO DE ACOMPANHAR NO AMBULATÓRIO."
"PACIENTE APRESENTOU BOA EVOLUÇÃO CLÍNICA APÓS OTIMIZAÇÃO DO TTO DA ICC."

Datasets

SemClinBr

Thumbnail

📚 Documentation

Acknowledgements

This study was partially funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Citation

@inproceedings{schneider-etal-2020-biobertpt,
    title = "{B}io{BERT}pt - A {P}ortuguese Neural Language Model for Clinical Named Entity Recognition",
    author = "Schneider, Elisa Terumi Rubel  and
      de Souza, Jo{\~a}o Vitor Andrioli  and
      Knafou, Julien  and
      Oliveira, Lucas Emanuel Silva e  and
      Copara, Jenny  and
      Gumiel, Yohan Bonescki  and
      Oliveira, Lucas Ferro Antunes de  and
      Paraiso, Emerson Cabrera  and
      Teodoro, Douglas  and
      Barra, Cl{\'a}udia Maria Cabral Moro",
    booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.7",
    pages = "65--72",
    abstract = "With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72{\%}, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.",
}

Questions?

If you have any questions, please post a Github issue on the BioBERTpt repo.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご