TITAN Open-Source Multimodal Model - Free for Pathology Image Analysis, Boosting Medical Diagnosis!

TITAN

Developed by MahmoodLab

TITAN is a multimodal whole slide foundation model pre-trained through visual self-supervised learning and vision-language alignment for pathology image analysis.

Multimodal Fusion

Safetensors

English#Pathology Multimodal #Whole Slide Analysis #Vision-Language Alignment

Downloads 213.39k

Release Time : 12/2/2024

Model Overview

TITAN is a pre-trained vision-language encoder specifically designed for feature extraction and multimodal alignment of pathology whole slide images. It integrates 335,645 whole slide images and extensive pathology report data, demonstrating outstanding performance in diverse downstream tasks.

Model Features

Multimodal Pre-training

Integrates visual self-supervised learning and vision-language alignment to simultaneously process image and text data

Large-scale Dataset

Utilizes 335,645 whole slide images covering diverse pathology types and extensive pathology report data

Diverse Application Capabilities

Supports linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval, and various other tasks

High Performance

Achieves state-of-the-art performance in multiple downstream tasks

Model Capabilities

Pathology image feature extraction

Pathology image classification

Cross-modal retrieval

Pathology report generation

Rare cancer identification

Zero-shot learning

Use Cases

Medical Diagnosis

Tumor Classification

Classify tumor types in pathology slides

Demonstrates excellent performance in various cancer type classification tasks

Rare Cancer Identification

Identify rare types of cancer

Shows outstanding performance in rare cancer retrieval tasks

Medical Research

Pathology Report Generation

Generate descriptive reports based on pathology images

Capable of producing accurate pathology descriptions

Cross-modal Retrieval

Retrieve relevant pathology images based on text descriptions

Achieves efficient image-text matching

🚀 TITAN-preview Model Card

TITAN-preview is a multimodal whole-slide foundation model. It combines visual self - supervised learning and vision - language alignment, achieving excellent performance on various downstream tasks.

🚀 Quick Start

Requesting Access

As stated in the gated prompt, you must agree to the terms of use, ensuring that the primary email in your HuggingFace account matches your institutional email. If your primary email is a personal one (@gmail/@hotmail/@qq), your request will be denied. To resolve this: (1) Add your official institutional email to your HF account and confirm it; (2) Set your institutional email as the primary one in your HF account. Other reasons for request denial include mistakes in the submitted form, such as abbreviated full names, unclear affiliations, insufficient research use descriptions, or unrecognized email domains.

Model Usage

TITAN-preview is a vision - language model trained on CONCH v1.5 patch features with a patch size of 512x512 pixels at 20x magnification.

After authentication (using huggingface_hub), you can load both TITAN-preview (slide and language encoders) and CONCH v1.5 (patch encoder) with the following commands:

from huggingface_hub import login
from transformers import AutoModel 

login()  # login with your User Access Token, found at https://huggingface.co/settings/tokens

titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)
conch, eval_transform = titan.return_conch()

You can directly use TITAN-preview for slide - level feature extraction. TITAN builds a feature grid from CONCH v1.5 patch features using coordinates and patch distances. Since patch coordinates are saved at the slide's level 0 magnification, TITAN takes patch_size_lv0, which represents the distance between two adjacent patches at level 0 magnification (1024 for 40x slides or 512 for 20x slides). This information is saved in our demo TCGA features.

Slide - level feature extraction can be done as follows:

import h5py
from transformers import AutoModel

# load model
titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)

# load CONCH v1.5 demo features
h5_path = 'TCGA_demo_features/TCGA-RM-A68W-01Z-00-DX1.4E62E4F4-415C-46EB-A6C8-45BA14E82708.h5'
with h5py.File(h5_path, 'r') as file:
    features = torch.from_numpy(file['features'][:])
    coords = torch.from_numpy(file['coords'][:])
    patch_size_lv0 = file['coords'].attrs['patch_size_level0']

# extract slide embedding
with torch.autocast('cuda', torch.float16), torch.inference_mode():
    slide_embedding = model.encode_slide_from_patch_features(features, coords, patch_size_lv0)

These pre - extracted features can be used for slide - level classification (via linear probing), retrieval (via l2 distance), and other machine learning tasks without task - specific finetuning.

We also released all TCGA TITAN - preview features in TCGA_TITAN_features.pkl. More detailed linear probe and zero - shot evaluation are demonstrated in our github.

✨ Features

TITAN (Transformer - based pathology Image and Text Alignment Network) is a multimodal whole - slide foundation model. It is pre - trained using visual self - supervised learning and vision - language alignment. It leverages 335,645 whole - slide images (WSIs) from diverse cases at Mass General Brigham, including neoplastic, infectious, and inflammatory cases. Additionally, TITAN uses over 182,000 pathology reports and more than 423,000 synthetic captions generated by PathChat. TITAN's slide embeddings achieve state - of - the - art performance on various downstream tasks, such as linear probing, few - shot and zero - shot classification, rare cancer retrieval, cross - modal retrieval, and pathology report generation.

📦 Installation

Requirements

torch==2.0.1
timm==1.0.3
einops==0.6.1
einops - exts==0.0.4
transformers==4.46.0

📚 Documentation

What is TITAN?

This is a preview version, and we will provide further updates and improvements. [Preprint] | [Github Repo] | [Cite]

Model Description

Property	Details
Developed by	Mahmood Lab AI for Pathology @ Harvard/BWH
Model Type	Pretrained vision - language encoders
Pretraining dataset	Mass - 340K, sourced from private histology collections (BWH / MGH), in addition to slides from the public GTEx consortium.
Repository	https://github.com/mahmoodlab/TITAN
Preprint	https://arxiv.org/abs/2411.19666
License	CC - BY - NC - ND - 4.0

📄 License

This model and associated code are released under the CC - BY - NC - ND 4.0 license and can only be used for non - commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives (including models trained on outputs from the TITAN model or datasets created from the TITAN model) is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re - identify the de - identified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author.

🔧 Technical Details

The project was built on top of amazing repositories such as [ViT](https://github.com/google - research/big_vision), iBOT, OpenClip, LGSSL, and [Timm](https://github.com/huggingface/pytorch - image - models/) (ViT model implementation).

📧 Contact

For any additional questions or comments, contact Faisal Mahmood (faisalmahmood@bwh.harvard.edu),
Tong Ding (tong_ding@g.harvard.edu),
Sophia J. Wagner (sophia.wagner@helmholtz - munich.de),
Andrew H. Song (asong@bwh.harvard.edu),
or Richard J. Chen (richardchen@g.harvard.edu).

🙏 Acknowledgements

We thank the authors and developers of [ViT](https://github.com/google - research/big_vision), iBOT, OpenClip, LGSSL, and [Timm](https://github.com/huggingface/pytorch - image - models/) for their contributions.

📖 BibTeX

If you found our work useful in your research, please consider citing our work:

Ding, T.*, Wagner S.J.*, Song, A.H.*, Chen, R.J.* et al. Multimodal Whole Slide Foundation Model for Pathology, Arxiv, 2024

@misc{ding2024multimodalslidefoundationmodel,
      title={Multimodal Whole Slide Foundation Model for Pathology}, 
      author={Tong Ding and Sophia J. Wagner and Andrew H. Song and Richard J. Chen and Ming Y. Lu and Andrew Zhang and Anurag J. Vaidya and Guillaume Jaume and Muhammad Shaban and Ahrong Kim and Drew F. K. Williamson and Bowen Chen and Cristina Almagro - Perez and Paul Doucet and Sharifa Sahai and Chengkuan Chen and Daisuke Komura and Akihiro Kawabe and Shumpei Ishikawa and Georg Gerber and Tingying Peng and Long Phi Le and Faisal Mahmood},
      year={2024},
      eprint={2411.19666},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2411.19666}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご