stanford - deidentifier - only - i2b2開源模型 - 自動去除放射學報告敏感信息，精準高效！

首頁

Stanford Deidentifier Only I2b2

由StanfordAIMI開發

基於轉換器模型的放射學報告自動去標識化系統，結合規則方法實現高精度PHI識別與替換

序列標註

Transformers

英語開源協議:MIT #放射報告去標識化 #PHI高精度檢測 #生物醫學文本處理

下載量 98

發布時間 : 6/9/2022

模型概述

專門用於生物醫學放射報告的去標識化處理，能自動檢測受保護健康信息(PHI)並替換為仿真內容，滿足HIPAA隱私要求

模型特點

高精度PHI檢測

在已知機構放射報告上達到97.9 F1分數，新機構達99.6，超越人工標註水平

跨機構適應性

在i2b2 2006/2014等多個測試集上驗證了優秀的泛化能力

混合方法設計

結合PubMedBERT轉換器模型與'大隱於市'規則方法，實現精準識別與自然替換

大規模訓練數據

基於6193份多機構跨領域文檔(含6193份放射報告和醫療記錄)訓練

模型能力

放射報告PHI實體識別

受保護健康信息自動替換

多類型PHI檢測(日期、醫生姓名、機構等)

跨機構文檔處理

使用案例

醫療隱私保護

放射報告去標識化

自動處理胸部X光/CT報告中的敏感信息

PHI核心內容識別召回率達99.1%

研究數據共享

為醫學研究提供符合HIPAA標準的匿名化數據

在i2b2 2014數據上超越人工標註水平

醫療信息系統

電子病歷處理

集成到醫療信息系統實現自動化去標識流程

支持MedClinical等醫療數據傳輸系統

🚀 斯坦福去標識符工具

斯坦福去標識符工具在多種放射學和生物醫學文檔上進行訓練，旨在自動化去識別過程，同時達到足以投入生產使用的令人滿意的準確率。相關論文正在發表中。

🚀 快速開始

斯坦福去標識符工具可用於對放射學和生物醫學文檔進行去識別處理，以保護患者的健康信息。

✨ 主要特性

多類型文檔支持：可處理多種放射學和生物醫學文檔。
自動化去識別：能夠自動檢測受保護的健康信息（PHI）實體，並將其替換為看似合理的替代信息。
高準確率：在多個測試集上取得了出色的F1分數，能有效識別和處理PHI信息。

📚 詳細文檔

示例數據

widget:
- text: "PROCEDURE: Chest xray. COMPARISON: last seen on 1/1/2020 and also record dated of March 1st, 2019. FINDINGS: patchy airspace opacities. IMPRESSION: The results of the chest xray of January 1 2020 are the most concerning ones. The patient was transmitted to another service of UH Medical Center under the responsability of Dr. Perez. We used the system MedClinical data transmitter and sent the data on 2/1/2020, under the ID 5874233. We received the confirmation of Dr Perez. He is reachable at 567-493-1234."
- text: "Dr. Curt Langlotz chose to schedule a meeting on 06/23."
tags:
- token-classification
- sequence-tagger-model
- pytorch
- transformers
- pubmedbert
- uncased
- radiology
- biomedical
datasets:
- radreports
language:
  - en
license: mit

關聯倉庫

關聯的GitHub倉庫：https://github.com/MIDRC/Stanford_Penn_Deidentifier

引用信息

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}