stanford-deidentifier开源放射学报告去标识系统

首页

Stanford Deidentifier With Radiology Reports And I2b2

由 StanfordAIMI 开发

基于转换器模型的放射学报告自动化去标识系统，通过检测受保护健康信息(PHI)并替换为拟真替代值实现隐私保护

序列标注

Transformers

英语开源协议:MIT #放射报告去标识 #PHI自动检测 #生物医学文本处理

下载量 126

发布时间 : 6/9/2022

模型简介

专为放射学和生物医学文档设计的自动化去标识模型，结合PubMedBERT转换器与'隐于市'规则方法，可高效识别并替换PHI信息

模型特点

跨机构高性能

在已知机构/新机构测试集上分别达到97.9/99.6 F1值，超越人工标注水平

混合方法论

结合PubMedBERT转换器与'隐于市'规则方法，同时保证识别精度与替换合理性

多领域验证

在6193份多机构跨领域数据集(含X光片/CT/病历)完成验证

模型能力

受保护健康信息检测

医疗文本去标识化

拟真替代值生成

放射报告隐私处理

使用案例

医疗隐私保护

胸片报告去标识

自动识别并替换胸部X光报告中的患者/医生/机构等敏感信息

PHI核心内容识别召回率达99.1%

跨机构数据共享

在保留临床价值前提下实现医疗数据的匿名化传输

在新机构数据上达到99.6 F1值

🚀 斯坦福去标识器

斯坦福去标识器在多种放射学和生物医学文档上进行了训练，旨在自动化去标识过程，同时达到可用于实际生产的令人满意的准确率。相关论文正在发表中。

🚀 快速开始

斯坦福去标识器可用于自动化放射学和生物医学文档的去标识过程。你可以通过关联的GitHub仓库获取更多详细信息和代码：https://github.com/MIDRC/Stanford_Penn_Deidentifier

✨ 主要特性

多类型文档支持：在多种放射学和生物医学文档上进行训练。
自动化去标识：能够自动完成去标识过程。
高准确率：达到了可用于实际生产的令人满意的准确率。

📚 详细文档

示例报告

检查项目：胸部X光检查。
对比情况：与2020年1月1日的检查结果对比，同时参考2019年3月1日的记录。
检查结果：片状肺野模糊影。
诊断意见：2020年1月1日的胸部X光检查结果最令人担忧。患者已转至UH医疗中心的另一个科室，由Perez医生负责。我们于2020年2月1日使用MedClinical数据传输系统发送了数据，ID为5874233。我们已收到Perez医生的确认信息，他的联系电话是567 - 493 - 1234。

会议安排

Curt Langlotz医生选择在6月23日安排一次会议。

标签信息

标记分类
序列标记模型
PyTorch
Transformers
PubMedBERT
无大小写区分
放射学
生物医学

数据集

radreports

语言

英语

📄 许可证

本项目采用MIT许可证。

🔖 引用

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}