đ TITAN-preview Model Card
TITAN-preview is a multimodal whole-slide foundation model. It combines visual self - supervised learning and vision - language alignment, achieving excellent performance on various downstream tasks.
đ Quick Start
Requesting Access
As stated in the gated prompt, you must agree to the terms of use, ensuring that the primary email in your HuggingFace account matches your institutional email. If your primary email is a personal one (@gmail/@hotmail/@qq), your request will be denied. To resolve this: (1) Add your official institutional email to your HF account and confirm it; (2) Set your institutional email as the primary one in your HF account. Other reasons for request denial include mistakes in the submitted form, such as abbreviated full names, unclear affiliations, insufficient research use descriptions, or unrecognized email domains.
Model Usage
TITAN-preview is a vision - language model trained on CONCH v1.5 patch features with a patch size of 512x512 pixels at 20x magnification.
After authentication (using huggingface_hub
), you can load both TITAN-preview (slide and language encoders) and CONCH v1.5 (patch encoder) with the following commands:
from huggingface_hub import login
from transformers import AutoModel
login()
titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)
conch, eval_transform = titan.return_conch()
You can directly use TITAN-preview for slide - level feature extraction. TITAN builds a feature grid from CONCH v1.5 patch features using coordinates and patch distances. Since patch coordinates are saved at the slide's level 0 magnification, TITAN takes patch_size_lv0
, which represents the distance between two adjacent patches at level 0 magnification (1024 for 40x slides or 512 for 20x slides). This information is saved in our demo TCGA features.
Slide - level feature extraction can be done as follows:
import h5py
from transformers import AutoModel
titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)
h5_path = 'TCGA_demo_features/TCGA-RM-A68W-01Z-00-DX1.4E62E4F4-415C-46EB-A6C8-45BA14E82708.h5'
with h5py.File(h5_path, 'r') as file:
features = torch.from_numpy(file['features'][:])
coords = torch.from_numpy(file['coords'][:])
patch_size_lv0 = file['coords'].attrs['patch_size_level0']
with torch.autocast('cuda', torch.float16), torch.inference_mode():
slide_embedding = model.encode_slide_from_patch_features(features, coords, patch_size_lv0)
These pre - extracted features can be used for slide - level classification (via linear probing), retrieval (via l2 distance), and other machine learning tasks without task - specific finetuning.
We also released all TCGA TITAN - preview features in TCGA_TITAN_features.pkl
. More detailed linear probe and zero - shot evaluation are demonstrated in our github.
⨠Features
TITAN (Transformer - based pathology Image and Text Alignment Network) is a multimodal whole - slide foundation model. It is pre - trained using visual self - supervised learning and vision - language alignment. It leverages 335,645 whole - slide images (WSIs) from diverse cases at Mass General Brigham, including neoplastic, infectious, and inflammatory cases. Additionally, TITAN uses over 182,000 pathology reports and more than 423,000 synthetic captions generated by PathChat. TITAN's slide embeddings achieve state - of - the - art performance on various downstream tasks, such as linear probing, few - shot and zero - shot classification, rare cancer retrieval, cross - modal retrieval, and pathology report generation.
đĻ Installation
Requirements
torch==2.0.1
timm==1.0.3
einops==0.6.1
einops - exts==0.0.4
transformers==4.46.0
đ Documentation
What is TITAN?
This is a preview version, and we will provide further updates and improvements.
[Preprint] | [Github Repo] | [Cite]
Model Description
Property |
Details |
Developed by |
Mahmood Lab AI for Pathology @ Harvard/BWH |
Model Type |
Pretrained vision - language encoders |
Pretraining dataset |
Mass - 340K, sourced from private histology collections (BWH / MGH), in addition to slides from the public GTEx consortium. |
Repository |
https://github.com/mahmoodlab/TITAN |
Preprint |
https://arxiv.org/abs/2411.19666 |
License |
CC - BY - NC - ND - 4.0 |
đ License
This model and associated code are released under the CC - BY - NC - ND 4.0 license and can only be used for non - commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives (including models trained on outputs from the TITAN model or datasets created from the TITAN model) is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re - identify the de - identified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author.
đ§ Technical Details
The project was built on top of amazing repositories such as [ViT](https://github.com/google - research/big_vision), iBOT, OpenClip, LGSSL, and [Timm](https://github.com/huggingface/pytorch - image - models/) (ViT model implementation).
đ§ Contact
For any additional questions or comments, contact Faisal Mahmood (faisalmahmood@bwh.harvard.edu
),
Tong Ding (tong_ding@g.harvard.edu
),
Sophia J. Wagner (sophia.wagner@helmholtz - munich.de
),
Andrew H. Song (asong@bwh.harvard.edu
),
or Richard J. Chen (richardchen@g.harvard.edu
).
đ Acknowledgements
We thank the authors and developers of [ViT](https://github.com/google - research/big_vision), iBOT, OpenClip, LGSSL, and [Timm](https://github.com/huggingface/pytorch - image - models/) for their contributions.
đ BibTeX
If you found our work useful in your research, please consider citing our work:
Ding, T.*, Wagner S.J.*, Song, A.H.*, Chen, R.J.* et al. Multimodal Whole Slide Foundation Model for Pathology, Arxiv, 2024
@misc{ding2024multimodalslidefoundationmodel,
title={Multimodal Whole Slide Foundation Model for Pathology},
author={Tong Ding and Sophia J. Wagner and Andrew H. Song and Richard J. Chen and Ming Y. Lu and Andrew Zhang and Anurag J. Vaidya and Guillaume Jaume and Muhammad Shaban and Ahrong Kim and Drew F. K. Williamson and Bowen Chen and Cristina Almagro - Perez and Paul Doucet and Sharifa Sahai and Chengkuan Chen and Daisuke Komura and Akihiro Kawabe and Shumpei Ishikawa and Georg Gerber and Tingying Peng and Long Phi Le and Faisal Mahmood},
year={2024},
eprint={2411.19666},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2411.19666},
}