stanford-deidentifier-only-radiology-reports開源系統 - 自動對放射報告去標識化，保護隱私

首頁

Stanford Deidentifier Only Radiology Reports

由StanfordAIMI開發

基於轉換器與規則方法的放射報告自動化去標識化系統，能檢測PHI實體並用擬真值替換

序列標註

Transformers

英語開源協議:MIT #放射報告去標識化 #PHI實體檢測 #生物醫學NLP

下載量 26

發布時間 : 6/9/2022

模型概述

該模型專門用於醫療放射學報告的去標識化處理，通過結合PubMedBERT轉換器模型和規則方法，自動檢測受保護健康信息(PHI)並進行安全替換，符合HIPAA隱私標準。

模型特點

混合方法架構

結合PubMedBERT轉換器模型與'隱於市'規則方法，實現高精度PHI檢測與替換

多機構驗證

在6193份跨機構醫療文檔上訓練，包含胸片、CT報告和醫療記錄

生產級準確度

在已知機構放射報告上達到97.9 F1值，新機構測試集達99.6 F1值

模型能力

醫療實體識別

受保護健康信息檢測

擬真值替換

放射報告處理

跨機構泛化

使用案例

醫療數據隱私保護

放射報告去標識化

自動識別並替換患者姓名、醫生姓名、聯繫方式等PHI信息

在i2b2 2014數據集上超越人工標註性能

多中心研究數據共享

安全處理跨機構醫療文檔以符合隱私法規要求

支持MedClinical等醫療數據傳輸系統

🚀 斯坦福去標識符工具

斯坦福去標識符工具在多種放射學和生物醫學文檔上進行了訓練，旨在自動化去識別過程，同時達到足以用於生產環境的令人滿意的準確率。相關論文正在發表中。

🚀 快速開始

該工具主要用於放射學和生物醫學文檔的去識別處理，通過訓練模型自動檢測並替換受保護的健康信息（PHI）實體。

✨ 主要特性

多領域適用性：在多種放射學和生物醫學文檔上進行訓練，適用於不同類型的醫療報告。
高精度：在多個測試集上取得了較高的F1分數，能夠準確檢測和替換PHI實體。
自動化處理：實現了去識別過程的自動化，提高了處理效率。

📚 詳細文檔

技術標籤

該項目涉及以下技術標籤：

token-classification
sequence-tagger-model
pytorch
transformers
pubmedbert
uncased
radiology
biomedical

示例報告內容

以下是一些示例報告的內容：

檢查流程：胸部X光檢查。對比：上次檢查於2020年1月1日，還有2019年3月1日的記錄。檢查結果：片狀肺野模糊影。印象：2020年1月1日的胸部X光檢查結果最令人擔憂。患者被轉至UH醫療中心的另一個科室，由Perez醫生負責。我們於2020年2月1日使用MedClinical數據傳輸系統發送了數據，ID為5874233。我們收到了Perez醫生的確認信息。他的聯繫電話是567 - 493 - 1234。
Curt Langlotz醫生選擇在6月23日安排一次會議。

關聯倉庫

關聯的GitHub倉庫：https://github.com/MIDRC/Stanford_Penn_Deidentifier

📄 許可證

本項目採用MIT許可證。

📚 引用

如果您使用了本項目，請引用以下論文：

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}