🚀 4M: Massively Multimodal Masked Modeling
A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.
Website
| GitHub
| BibTeX
🚀 Quick Start
4M is a framework for training "any-to-any" foundation models, using tokenization and masking to scale to many diverse modalities. Models trained using 4M can perform a wide range of vision tasks, transfer well to unseen tasks and modalities, and are flexible and steerable multimodal generative models. We are releasing code and models for "4M: Massively Multimodal Masked Modeling" (here denoted 4M - 7), as well as "4M - 21: An Any - to - Any Vision Model for Tens of Tasks and Modalities" (here denoted 4M - 21).
✨ Features
- Official implementation and pre - trained models for:
- 4M: Massively Multimodal Masked Modeling, NeurIPS 2023 (Spotlight)
David Mizrahi*, [Roman Bachmann](https://roman - bachmann.github.io/)*, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir
- 4M - 21: An Any - to - Any Vision Model for Tens of Tasks and Modalities, arXiv 2024
[Roman Bachmann](https://roman - bachmann.github.io/)*, Oğuzhan Fatih Kar*, David Mizrahi*, Ali Garjani, Mingfei Gao, David Griffiths, [Jiaming Hu](https://scholar.google.com/citations?user = vm3imKsAAAAJ&hl = en), Afshin Dehghan, Amir Zamir
📦 Installation
For install instructions, please see https://github.com/apple/ml-4m.
💻 Usage Examples
Basic Usage
from fourm.models.fm import FM
fm = FM.from_pretrained('EPFL-VILAB/4M-7_B_CC12M')
Please see https://github.com/apple/ml-4m/blob/main/README_GENERATION.md for more detailed instructions and https://github.com/apple/ml-4m for other 4M model and tokenizer checkpoints.
📚 Documentation
If you find this repository helpful, please consider citing our work:
@inproceedings{4m,
title={{4M}: Massively Multimodal Masked Modeling},
author={David Mizrahi and Roman Bachmann and O{\u{g}}uzhan Fatih Kar and Teresa Yeo and Mingfei Gao and Afshin Dehghan and Amir Zamir},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
}
@article{4m21,
title={{4M-21}: An Any-to-Any Vision Model for Tens of Tasks and Modalities},
author={Roman Bachmann and O{\u{g}}uzhan Fatih Kar and David Mizrahi and Ali Garjani and Mingfei Gao and David Griffiths and Jiaming Hu and Afshin Dehghan and Amir Zamir},
journal={arXiv 2024},
year={2024},
}
📄 License
The model weights in this repository are released under the Sample Code license as found in the LICENSE file.
Property |
Details |
Pipeline Tag |
any - to - any |
Model Type |
Not specified |
Training Data |
Not specified |
License |
Other |
License Name |
sample - code - license |
License Link |
LICENSE |
Library Name |
ml - 4m |