mbg-clinicalbert開源模型 - 助力保加利亞語醫學及臨床文本處理分析

首頁

Mbg Clinicalbert

由svassileva開發

基於ClinicalBERT模型，針對保加利亞語醫學及臨床文本進行額外預訓練的BERT模型

大型語言模型

Transformers

其他#保加利亞語醫學文本處理 #臨床診斷編碼 #跨語言遷移學習

下載量 45

發布時間 : 3/2/2022

模型概述

本模型以ClinicalBERT為基礎，通過保加利亞語醫學及臨床文本進行增量預訓練，適用於保加利亞語臨床文本處理任務

模型特點

跨語言醫學領域適應

將英語醫學預訓練模型成功遷移應用於保加利亞語臨床文本處理

稀有語言支持

專門針對保加利亞語這類稀有語言進行優化，處理西里爾字符和特定醫學術語

臨床文本優化

通過保加利亞語醫學文本的增量預訓練，增強對臨床術語的理解能力

模型能力

臨床文本分類

醫學術語理解

ICD-10編碼預測

保加利亞語文本處理

使用案例

醫療編碼

ICD-10自動編碼

自動將保加利亞語臨床診斷編碼為標準ICD-10分類

在保加利亞語診斷語料庫上表現出色

臨床文檔處理

臨床報告自動化

支持醫生日常臨床文檔的編寫與報告工作

🚀 MBG-ClinicalBERT

MBG-ClinicalBERT是一個基於ClinicalBERT的模型，並在保加利亞醫學和臨床文本上進行了額外的預訓練。它能夠更好地處理保加利亞語的醫學和臨床文本，為相關領域的自然語言處理任務提供支持。

📚 詳細文檔

模型詳情

屬性	詳情
模型類型	基於BERT的模型
支持語言	保加利亞語
應用領域	臨床文本
描述	該模型基於ClinicalBERT，並在保加利亞醫學和臨床文本上進行了額外的預訓練
更多信息資源	Github倉庫，論文

引用方式

@inproceedings{velichkov-etal-2021-comparative,
    title = "Comparative Analysis of Fine-tuned Deep Learning Language Models for {ICD}-10 Classification Task for {B}ulgarian Language",
    author = "Velichkov, Boris  and
      Vassileva, Sylvia  and
      Gerginov, Simeon  and
      Kraychev, Boris  and
      Ivanov, Ivaylo  and
      Ivanov, Philip  and
      Koychev, Ivan  and
      Boytcheva, Svetla",
    booktitle = "Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)",
    month = sep,
    year = "2021",
    address = "Held Online",
    publisher = "INCOMA Ltd.",
    url = "https://aclanthology.org/2021.ranlp-1.162",
    pages = "1448--1454",
    abstract = "The task of automatic diagnosis encoding into standard medical classifications and ontologies, is of great importance in medicine - both to support the daily tasks of physicians in the preparation and reporting of clinical documentation, and for automatic processing of clinical reports. In this paper we investigate the application and performance of different deep learning transformers for automatic encoding in ICD-10 of clinical texts in Bulgarian. The comparative analysis attempts to find which approach is more efficient to be used for fine-tuning of pretrained BERT family transformer to deal with a specific domain terminology on a rare language as Bulgarian. On the one side are used SlavicBERT and MultiligualBERT, that are pretrained for common vocabulary in Bulgarian, but lack medical terminology. On the other hand in the analysis are used BioBERT, ClinicalBERT, SapBERT, BlueBERT, that are pretrained for medical terminology in English, but lack training for language models in Bulgarian, and more over for vocabulary in Cyrillic. In our research study all BERT models are fine-tuned with additional medical texts in Bulgarian and then applied to the classification task for encoding medical diagnoses in Bulgarian into ICD-10 codes. Big corpora of diagnosis in Bulgarian annotated with ICD-10 codes is used for the classification task. Such an analysis gives a good idea of which of the models would be suitable for tasks of a similar type and domain. The experiments and evaluation results show that both approaches have comparable accuracy.",
}