スタンフォード非識別化オープンソース放射線学レポートの識別情報除去システム

ホーム

Stanford Deidentifier With Radiology Reports And I2b2

StanfordAIMIによって開発

トランスフォーマーモデルに基づく放射学レポートの自動識別情報削除システム。保護対象健康情報(PHI)を検出し、擬似代替値に置換することでプライバシー保護を実現

シーケンスラベリング

Transformers

英語オープンソースライセンス:MIT #放射レポートの識別情報削除 #PHI自動検出 #生物医学テキスト処理

ダウンロード数 126

リリース時間 : 6/9/2022

モデル概要

放射学および生物医学文書向けに設計された自動識別情報削除モデル。PubMedBERTトランスフォーマーと'大衆に紛れる'ルール手法を組み合わせ、PHI情報を効率的に識別・置換

モデル特徴

多機関にわたる高性能

既知機関/新規機関テストセットでそれぞれ97.9/99.6 F1値を達成、人手ラベリングを上回る性能

ハイブリッド手法

PubMedBERTトランスフォーマーと'大衆に紛れる'ルール手法を組み合わせ、識別精度と置換合理性を両立

多分野検証

6193件の多機関・多分野データセット(X線/CT/医療記録含む)で検証済み

モデル能力

保護対象健康情報検出

医療テキスト識別情報削除

擬似代替値生成

放射レポートのプライバシー処理

使用事例

医療プライバシー保護

胸部X線レポートの識別情報削除

胸部X線レポート中の患者/医師/機関等の機微情報を自動識別・置換

PHIコアコンテンツ識別再現率99.1%

機関間データ共有

臨床価値を保持したまま医療データの匿名化伝送を実現

新規機関データで99.6 F1値を達成

🚀 スタンフォード脱識別ツール

放射線学および生物医学文書の脱識別プロセスを自動化し、実運用での使用に十分な精度を達成するために開発されたツールです。

🚀 クイックスタート

Stanford de-identifierは、様々な放射線学および生物医学文書を対象にトレーニングされ、脱識別プロセスの自動化を目的としています。実運用での使用に十分な精度を達成することを目指しています。

関連するGitHubリポジトリ: https://github.com/MIDRC/Stanford_Penn_Deidentifier

✨ 主な機能

放射線学および生物医学文書の脱識別を自動化
保護された医療情報（PHI）エンティティを検出し、現実的な代理情報に置き換える

📄 ライセンス

このプロジェクトはMITライセンスの下で公開されています。

📚 ドキュメント

サンプルデータ

widget:
- text: "PROCEDURE: Chest xray. COMPARISON: last seen on 1/1/2020 and also record dated of March 1st, 2019. FINDINGS: patchy airspace opacities. IMPRESSION: The results of the chest xray of January 1 2020 are the most concerning ones. The patient was transmitted to another service of UH Medical Center under the responsability of Dr. Perez. We used the system MedClinical data transmitter and sent the data on 2/1/2020, under the ID 5874233. We received the confirmation of Dr Perez. He is reachable at 567-493-1234."
- text: "Dr. Curt Langlotz chose to schedule a meeting on 06/23."
tags:
- token-classification
- sequence-tagger-model
- pytorch
- transformers
- pubmedbert
- uncased
- radiology
- biomedical
datasets:
- radreports
language:
  - en
license: mit

引用情報

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}