4M-7-SR_L_CC12M Open-source Multi-modal Framework - Supports Arbitrary Modal Conversion and Covers Multiple Tasks

4M 7 SR L CC12M

Developed by EPFL-VILAB

4M is a scalable multimodal masked modeling framework that supports any-to-any modality conversion, covering dozens of modalities and tasks.

Multimodal Fusion

Safetensors

Open Source License:Other #Multimodal Conversion #Masked Modeling #Any-to-Any

Downloads 26

Release Time : 3/25/2024

Model Overview

4M is an 'any-to-any' foundational model training framework extended to multiple modalities through tokenization and masking techniques. Models trained based on 4M can perform a wide range of visual tasks, transfer to unseen tasks and modalities, and possess flexible and controllable multimodal generation capabilities.

Model Features

Any-to-Any Modality Conversion

Supports flexible conversion between multiple modalities, including visual, language, and dozens of other modalities.

Scalability

The framework design allows for easy extension to new modalities and tasks.

Transfer Learning Capability

Capable of transferring to unseen tasks and modalities.

Controllable Generation

Supports flexible and controllable multimodal content generation.

Model Capabilities

Multimodal conversion

Visual task processing

Controllable content generation

Cross-modal transfer learning

Use Cases

Computer Vision

Image Generation and Editing

Generate or edit images based on inputs from different modalities

Cross-Modal Applications

Text-to-Image Generation

Generate corresponding images based on text descriptions

🚀 4M: Massively Multimodal Masked Modeling

A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.

Website | GitHub | BibTeX

🚀 Quick Start

4M is a framework for training "any-to-any" foundation models, using tokenization and masking to scale to many diverse modalities. Models trained using 4M can perform a wide range of vision tasks, transfer well to unseen tasks and modalities, and are flexible and steerable multimodal generative models. We are releasing code and models for "4M: Massively Multimodal Masked Modeling" (here denoted 4M-7), as well as "4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities" (here denoted 4M-21).

✨ Features

A framework for training any-to-any multimodal foundation models.
Scalable and open-sourced across tens of modalities and tasks.
Models can perform a wide range of vision tasks and transfer well to unseen tasks and modalities.
Flexible and steerable multimodal generative models.

📦 Installation

For install instructions, please see https://github.com/apple/ml-4m.

💻 Usage Examples

Basic Usage

from fourm.models.fm import FM
fm = FM.from_pretrained('EPFL-VILAB/4M-7-SR_L_CC12M')

Please see https://github.com/apple/ml-4m/blob/main/README_GENERATION.md for more detailed instructions and https://github.com/apple/ml-4m for other 4M model and tokenizer checkpoints.

📚 Documentation

Official implementation and pre-trained models for :

4M: Massively Multimodal Masked Modeling, NeurIPS 2023 (Spotlight)
David Mizrahi*, Roman Bachmann*, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities, arXiv 2024
Roman Bachmann*, Oğuzhan Fatih Kar*, David Mizrahi*, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir

📄 License

The model weights in this repository are released under the Sample Code license as found in the LICENSE file.

📄 Citation

If you find this repository helpful, please consider citing our work:

@inproceedings{4m,
    title={{4M}: Massively Multimodal Masked Modeling},
    author={David Mizrahi and Roman Bachmann and O{\u{g}}uzhan Fatih Kar and Teresa Yeo and Mingfei Gao and Afshin Dehghan and Amir Zamir},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
}

@article{4m21,
    title={{4M-21}: An Any-to-Any Vision Model for Tens of Tasks and Modalities},
    author={Roman Bachmann and O{\u{g}}uzhan Fatih Kar and David Mizrahi and Ali Garjani and Mingfei Gao and David Griffiths and Jiaming Hu and Afshin Dehghan and Amir Zamir},
    journal={arXiv 2024},
    year={2024},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご