Stanford - Deidentifier無料オープンソースシステム - 放射線レポートの高精度な識別情報削除を自動化実現

ホーム

Stanford Deidentifier Only Radiology Reports Augmented

StanfordAIMIによって開発

トランスフォーマーモデルに基づく放射線レポート自動非識別化システム、ルールベース手法と組み合わせて高精度PHI識別・置換を実現

シーケンスラベリング

Transformers

英語オープンソースライセンス:MIT #放射レポートの非識別化 #PHI自動検出 #バイオメディカルNLP

ダウンロード数 30

リリース時間 : 6/9/2022

モデル概要

放射線学およびバイオメディカル文書向けに設計された自動非識別化モデル、保護対象健康情報(PHI)エンティティを検出し安全な代替値で置換することでHIPAAプライバシー要件を満たす

モデル特徴

機関横断的高性能

既知機関の放射線レポートで97.9 F1値を達成、新規機関テストでは99.6を記録し人手注釈レベルを超越

多分野適応性

訓練データには胸部X線、CTレポート、一般医療記録を含む6193件の多機関横断文書を網羅

ハイブリッド手法設計

PubMedBERTトランスフォーマーモデルと'隠れ通常'ルール手法を組み合わせ、精密なPHI検出と置換を実現

モデル能力

放射レポートPHI識別

バイオメディカルテキスト非識別化

機密情報自動置換

機関横断文書処理

使用事例

医療プライバシー保護

胸部X線レポート非識別化

胸部X線画像中の患者情報、医師名、機関情報を自動識別・置換

テストセットで99.1%のコアPHI識別再現率を達成

機関横断データ共有

異なる医療機関からの放射線学レポートを処理し、標準化された非識別化出力を実現

新規機関データで99.6 F1値を取得

研究データ準備

臨床研究データ匿名化

医学研究向けにプライバシー要件を満たす放射線学データセットを準備

HIPAA基準に準拠した研究用データセット生成を支援

🚀 スタンフォード脱識別ツール

スタンフォード脱識別ツールは、放射線学や生物医学の様々な文書を対象にトレーニングされており、脱識別プロセスの自動化を目的とし、実運用に耐えうる十分な精度を達成しています。

🚀 クイックスタート

Stanford de-identifierは、放射線学や生物医学の文書の脱識別プロセスを自動化するために開発されました。関連するGitHubリポジトリはこちらです: https://github.com/MIDRC/Stanford_Penn_Deidentifier

✨ 主な機能

放射線学や生物医学の文書から個人情報を自動的に検出し、脱識別することができます。
トランスフォーマーモデルとルールベースの手法を組み合わせることで、高精度な脱識別を実現しています。

📦 インストール

READMEに具体的なインストール手順は記載されていないため、このセクションは省略されます。

💻 使用例

READMEに具体的なコード例は記載されていないため、このセクションは省略されます。

📚 ドキュメント

Stanford de-identifierは、放射線学や生物医学の文書の脱識別に使用されます。このモデルは、様々な放射線学および生物医学文書でトレーニングされ、生産環境での使用に十分な精度を達成しています。

モデル情報

属性	详情
モデルタイプ	token-classification、sequence-tagger-model
トレーニングデータ	radreports

ウィジェット情報

{
    "widget": [
        {
            "text": "手順: 胸部レントゲン。比較: 2020年1月1日に最後に見たもの、および2019年3月1日の記録。所見: 斑状の空域陰影。所見: 2020年1月1日の胸部レントゲンの結果が最も懸念されるものです。患者は、ペレス博士の責任の下、UH医療センターの別の部門に移送されました。私たちはMedClinicalデータ送信システムを使用し、ID 5874233で2020年2月1日にデータを送信しました。ペレス博士から確認を受けました。彼は567-493-1234で連絡を取ることができます。"
        },
        {
            "text": "カート・ラングロッツ博士は、6月23日に会議を開催することを選択しました。"
        }
    ],
    "tags": [
        "token-classification",
        "sequence-tagger-model",
        "pytorch",
        "transformers",
        "pubmedbert",
        "uncased",
        "radiology",
        "biomedical"
    ],
    "datasets": [
        "radreports"
    ],
    "language": [
        "en"
    ],
    "license": "mit"
}

🔧 技術詳細

このモデルは、トランスフォーマーモデルと「目の前に隠す」ルールベースの手法を組み合わせて、放射線学レポートの自動脱識別パイプラインを開発しました。このパイプラインは、保護された医療情報（PHI）エンティティを検出し、それらを「目の前に隠れた」現実的な代理情報に置き換えます。

📄 ライセンス

このプロジェクトはMITライセンスの下で公開されています。

Citation

@article{10.1093/jamia/ocac219,
    author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},
    title = "{Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods}",
    journal = {Journal of the American Medical Informatics Association},
    year = {2022},
    month = {11},
    abstract = "{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates “hiding in plain sight.”In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocac219},
    url = {https://doi.org/10.1093/jamia/ocac219},
    note = {ocac219},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},
}