mbg-clinicalbert开源模型 - 助力保加利亚语医学及临床文本处理分析

首页

Mbg Clinicalbert

由 svassileva 开发

基于ClinicalBERT模型，针对保加利亚语医学及临床文本进行额外预训练的BERT模型

大型语言模型

Transformers

其他#保加利亚语医学文本处理 #临床诊断编码 #跨语言迁移学习

下载量 45

发布时间 : 3/2/2022

模型简介

本模型以ClinicalBERT为基础，通过保加利亚语医学及临床文本进行增量预训练，适用于保加利亚语临床文本处理任务

模型特点

跨语言医学领域适应

将英语医学预训练模型成功迁移应用于保加利亚语临床文本处理

稀有语言支持

专门针对保加利亚语这类稀有语言进行优化，处理西里尔字符和特定医学术语

临床文本优化

通过保加利亚语医学文本的增量预训练，增强对临床术语的理解能力

模型能力

临床文本分类

医学术语理解

ICD-10编码预测

保加利亚语文本处理

使用案例

医疗编码

ICD-10自动编码

自动将保加利亚语临床诊断编码为标准ICD-10分类

在保加利亚语诊断语料库上表现出色

临床文档处理

临床报告自动化

支持医生日常临床文档的编写与报告工作

🚀 MBG-ClinicalBERT

MBG-ClinicalBERT是一个基于ClinicalBERT的模型，并在保加利亚医学和临床文本上进行了额外的预训练。它能够更好地处理保加利亚语的医学和临床文本，为相关领域的自然语言处理任务提供支持。

📚 详细文档

模型详情

属性	详情
模型类型	基于BERT的模型
支持语言	保加利亚语
应用领域	临床文本
描述	该模型基于ClinicalBERT，并在保加利亚医学和临床文本上进行了额外的预训练
更多信息资源	Github仓库，论文

引用方式

@inproceedings{velichkov-etal-2021-comparative,
    title = "Comparative Analysis of Fine-tuned Deep Learning Language Models for {ICD}-10 Classification Task for {B}ulgarian Language",
    author = "Velichkov, Boris  and
      Vassileva, Sylvia  and
      Gerginov, Simeon  and
      Kraychev, Boris  and
      Ivanov, Ivaylo  and
      Ivanov, Philip  and
      Koychev, Ivan  and
      Boytcheva, Svetla",
    booktitle = "Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)",
    month = sep,
    year = "2021",
    address = "Held Online",
    publisher = "INCOMA Ltd.",
    url = "https://aclanthology.org/2021.ranlp-1.162",
    pages = "1448--1454",
    abstract = "The task of automatic diagnosis encoding into standard medical classifications and ontologies, is of great importance in medicine - both to support the daily tasks of physicians in the preparation and reporting of clinical documentation, and for automatic processing of clinical reports. In this paper we investigate the application and performance of different deep learning transformers for automatic encoding in ICD-10 of clinical texts in Bulgarian. The comparative analysis attempts to find which approach is more efficient to be used for fine-tuning of pretrained BERT family transformer to deal with a specific domain terminology on a rare language as Bulgarian. On the one side are used SlavicBERT and MultiligualBERT, that are pretrained for common vocabulary in Bulgarian, but lack medical terminology. On the other hand in the analysis are used BioBERT, ClinicalBERT, SapBERT, BlueBERT, that are pretrained for medical terminology in English, but lack training for language models in Bulgarian, and more over for vocabulary in Cyrillic. In our research study all BERT models are fine-tuned with additional medical texts in Bulgarian and then applied to the classification task for encoding medical diagnoses in Bulgarian into ICD-10 codes. Big corpora of diagnosis in Bulgarian annotated with ICD-10 codes is used for the classification task. Such an analysis gives a good idea of which of the models would be suitable for tasks of a similar type and domain. The experiments and evaluation results show that both approaches have comparable accuracy.",
}