Open-source 4M-7_B_CC12M model - Achieve arbitrary-to-arbitrary multimodal conversion, highly powerful and easy to use!

4M 7 B CC12M

Developed by EPFL-VILAB

4M is a framework for training 'any-to-any' multimodal foundation models, extending to various different modalities through tokenization and masking techniques.

Multimodal Fusion

Safetensors

Open Source License:Other #Multimodal Transformation #Arbitrary Modality Generation #Masked Pretraining

Downloads 209

Release Time : 3/25/2024

Model Overview

Models trained with the 4M framework can perform a wide range of visual tasks, transfer well to unseen tasks and modalities, and serve as flexible and controllable multimodal generative models.

Model Features

Any-to-Any Modality Transformation

Supports mutual conversion between various different modalities

Large-scale Multimodal Support

Can be extended to dozens of different modalities and tasks

Strong Transfer Capability

Can transfer well to unseen tasks and modalities

Flexible and Controllable Generation

Highly flexible and controllable as a multimodal generative model

Model Capabilities

Multimodal data transformation

Visual task processing

Multimodal generation

Cross-modal transfer learning

Use Cases

Computer Vision

Image Generation

Generate images from other modality data

Visual Question Answering

Answer questions by combining visual and language modalities

Multimodal Applications

Cross-modal Retrieval

Retrieve data across different modalities

🚀 4M: Massively Multimodal Masked Modeling

A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.

Website | GitHub | BibTeX

🚀 Quick Start

4M is a framework for training "any-to-any" foundation models, using tokenization and masking to scale to many diverse modalities. Models trained using 4M can perform a wide range of vision tasks, transfer well to unseen tasks and modalities, and are flexible and steerable multimodal generative models. We are releasing code and models for "4M: Massively Multimodal Masked Modeling" (here denoted 4M - 7), as well as "4M - 21: An Any - to - Any Vision Model for Tens of Tasks and Modalities" (here denoted 4M - 21).

✨ Features

Official implementation and pre - trained models for:
- 4M: Massively Multimodal Masked Modeling, NeurIPS 2023 (Spotlight) David Mizrahi*, [Roman Bachmann](https://roman - bachmann.github.io/)*, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir
- 4M - 21: An Any - to - Any Vision Model for Tens of Tasks and Modalities, arXiv 2024 [Roman Bachmann](https://roman - bachmann.github.io/)*, Oğuzhan Fatih Kar*, David Mizrahi*, Ali Garjani, Mingfei Gao, David Griffiths, [Jiaming Hu](https://scholar.google.com/citations?user = vm3imKsAAAAJ&hl = en), Afshin Dehghan, Amir Zamir

📦 Installation

For install instructions, please see https://github.com/apple/ml-4m.

💻 Usage Examples

Basic Usage

# This model can be loaded from Hugging Face Hub as follows:
from fourm.models.fm import FM
fm = FM.from_pretrained('EPFL-VILAB/4M-7_B_CC12M')

Please see https://github.com/apple/ml-4m/blob/main/README_GENERATION.md for more detailed instructions and https://github.com/apple/ml-4m for other 4M model and tokenizer checkpoints.

📚 Documentation

If you find this repository helpful, please consider citing our work:

@inproceedings{4m,
    title={{4M}: Massively Multimodal Masked Modeling},
    author={David Mizrahi and Roman Bachmann and O{\u{g}}uzhan Fatih Kar and Teresa Yeo and Mingfei Gao and Afshin Dehghan and Amir Zamir},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
}

@article{4m21,
    title={{4M-21}: An Any-to-Any Vision Model for Tens of Tasks and Modalities},
    author={Roman Bachmann and O{\u{g}}uzhan Fatih Kar and David Mizrahi and Ali Garjani and Mingfei Gao and David Griffiths and Jiaming Hu and Afshin Dehghan and Amir Zamir},
    journal={arXiv 2024},
    year={2024},
}

📄 License

The model weights in this repository are released under the Sample Code license as found in the LICENSE file.

Property	Details
Pipeline Tag	any - to - any
Model Type	Not specified
Training Data	Not specified
License	Other
License Name	sample - code - license
License Link	LICENSE
Library Name	ml - 4m

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご