š MediAlbertina
The first publicly available medical language models trained with real European Portuguese data, offering enhanced performance for medical AI in Portugal.
š Quick Start
MediAlbertina is a family of encoders from the Bert family, DeBERTaV2 - based. It results from the continuation of the pre - training of PORTULAN's Albertina models with Electronic Medical Records shared by Portugal's largest public hospital.
⨠Features
- Domain Adaptation: MediAlbertina PT - PT 900M was created through domain adaptation of Albertina PT - PT 900M on real European Portuguese EMRs using masked language modeling.
- Superior Performance: In Information Extraction (IE) tasks such as Named Entity Recognition (NER) and Assertion Status (AStatus), MediAlbertina achieved better F1 - scores compared to its predecessors, as shown in the table below:
Model |
NER single - model |
NER multi - models |
Assertion Status |
|
F1 - score |
F1 - score |
F1 - score |
albertina-900m-portuguese-ptpt-encoder |
0.813 |
0.811 |
0.687 |
medialbertina_pt - pt_900m |
0.832 |
0.848 |
0.755 |
š¦ Installation
No specific installation steps are provided in the original document.
š» Usage Examples
Basic Usage
from transformers import pipeline
unmasker = pipeline('fill - mask', model='portugueseNLP/medialbertina_pt - pt_900m')
unmasker("Analgesia com morfina em perfusão (15 [MASK]/kg/h)")
Widget Examples
Here are some examples you can try:
Example Title |
Input Text |
Example 1 |
"Febre e tosse são sintomas comuns de [MASK]" |
Example 2 |
"Diabetes [MASK] tipo II" |
Example 3 |
"Utente tolera dieta [MASK] / Nivel de glicƩmia bom." |
Example 4 |
"Doente com administração de [MASK] com tramal." |
Example 5 |
"Colocada sonda de gases por apresentar [MASK] timpanizado" |
Example 6 |
"Conectada em PRVC com necessidade de aumentar [MASK] para 70%" |
Example 7 |
"Medicado com [MASK] em dias alternados." |
Example 8 |
"Realizado teste de [MASK] ao paciente" |
Example 9 |
"Sintomas apontam para COVID [MASK]." |
Example 10 |
"Durante internamento fez [MASK] fresco congelado 3x dia" |
Example 11 |
"Pupilas iso [MASK]." |
Example 12 |
"Cardiopatia [MASK] - causa provƔvel: HAS" |
Example 13 |
"O paciente encontra - se [MASK] estƔvel." |
Example 14 |
"Traumatismo [MASK] após acidente de viação." |
Example 15 |
"Analgesia com morfina em perfusão (15 [MASK]/kg/h)" |
š Documentation
Data
MediAlbertina PT - PT 900M was trained on more than 15M sentences and 300M tokens from 2.6M fully anonymized and unique Electronic Medical Records (EMRs) from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth - Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte - iul.pt/projects/aplicacoes - moveis - baseadas - em - inteligencia - artificial - para - resposta - de - saude - publica/1567).
š§ Technical Details
MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt - pt_900m/blob/main/LICENSE).
š License
MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt - pt_900m/blob/main/LICENSE).
Citation
MediAlbertina is developed by a joint team from [ISCTE - IUL](https://www.iscte - iul.pt/), Portugal, and Select Data, CA USA. For a fully detailed description, check the respective publication:
@article{MediAlbertina PT - PT,
title={MediAlbertina: An European Portuguese medical language model},
author={Miguel Nunes and João Boné and João Ferreira
and Pedro Chaves and LuĆs Elvas},
year={2024},
journal={CBM},
volume={182}
url={https://doi.org/10.1016/j.compbiomed.2024.109233}
}
Please use the above cannonical reference when using or citing this model.
Acknowledgements
This work was financially supported by Project Blockchain.PT ā Decentralize Portugal with Blockchain Agenda, (Project no 51), WP2, Call no 02/C05 - i01.01/2022, funded by the Portuguese Recovery and Resillience Program (PRR), The Portuguese Republic and The European Union (EU) under the framework of Next Generation EU Program.