stanford-deidentifier開源放射學報告去標識系統

首頁

Stanford Deidentifier With Radiology Reports And I2b2

由StanfordAIMI開發

基於轉換器模型的放射學報告自動化去標識系統，通過檢測受保護健康信息(PHI)並替換為擬真替代值實現隱私保護

序列標註

Transformers

英語開源協議:MIT #放射報告去標識 #PHI自動檢測 #生物醫學文本處理

下載量 126

發布時間 : 6/9/2022

模型概述

專為放射學和生物醫學文檔設計的自動化去標識模型，結合PubMedBERT轉換器與'隱於市'規則方法，可高效識別並替換PHI信息

模型特點

跨機構高性能

在已知機構/新機構測試集上分別達到97.9/99.6 F1值，超越人工標註水平

混合方法論

結合PubMedBERT轉換器與'隱於市'規則方法，同時保證識別精度與替換合理性

多領域驗證

在6193份多機構跨領域數據集(含X光片/CT/病歷)完成驗證

模型能力

受保護健康信息檢測

醫療文本去標識化

擬真替代值生成

放射報告隱私處理

使用案例

醫療隱私保護

胸片報告去標識

自動識別並替換胸部X光報告中的患者/醫生/機構等敏感信息

PHI核心內容識別召回率達99.1%

跨機構數據共享

在保留臨床價值前提下實現醫療數據的匿名化傳輸

在新機構數據上達到99.6 F1值

🚀 斯坦福去標識器

斯坦福去標識器在多種放射學和生物醫學文檔上進行了訓練，旨在自動化去標識過程，同時達到可用於實際生產的令人滿意的準確率。相關論文正在發表中。

🚀 快速開始

斯坦福去標識器可用於自動化放射學和生物醫學文檔的去標識過程。你可以通過關聯的GitHub倉庫獲取更多詳細信息和代碼：https://github.com/MIDRC/Stanford_Penn_Deidentifier

✨ 主要特性

多類型文檔支持：在多種放射學和生物醫學文檔上進行訓練。
自動化去標識：能夠自動完成去標識過程。
高準確率：達到了可用於實際生產的令人滿意的準確率。

📚 詳細文檔

示例報告

檢查項目：胸部X光檢查。
對比情況：與2020年1月1日的檢查結果對比，同時參考2019年3月1日的記錄。
檢查結果：片狀肺野模糊影。
診斷意見：2020年1月1日的胸部X光檢查結果最令人擔憂。患者已轉至UH醫療中心的另一個科室，由Perez醫生負責。我們於2020年2月1日使用MedClinical數據傳輸系統發送了數據，ID為5874233。我們已收到Perez醫生的確認信息，他的聯繫電話是567 - 493 - 1234。

會議安排

Curt Langlotz醫生選擇在6月23日安排一次會議。

標籤信息

標記分類
序列標記模型
PyTorch
Transformers
PubMedBERT
無大小寫區分
放射學
生物醫學

數據集

radreports

語言

英語

📄 許可證

本項目採用MIT許可證。

🔖 引用

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}