4

4M 7 B CC12M

Developed by EPFL-VILAB
4M is a framework for training 'any-to-any' multimodal foundation models, extending to various different modalities through tokenization and masking techniques.
Downloads 209
Release Time : 3/25/2024

Model Overview

Models trained with the 4M framework can perform a wide range of visual tasks, transfer well to unseen tasks and modalities, and serve as flexible and controllable multimodal generative models.

Model Features

Any-to-Any Modality Transformation
Supports mutual conversion between various different modalities
Large-scale Multimodal Support
Can be extended to dozens of different modalities and tasks
Strong Transfer Capability
Can transfer well to unseen tasks and modalities
Flexible and Controllable Generation
Highly flexible and controllable as a multimodal generative model

Model Capabilities

Multimodal data transformation
Visual task processing
Multimodal generation
Cross-modal transfer learning

Use Cases

Computer Vision
Image Generation
Generate images from other modality data
Visual Question Answering
Answer questions by combining visual and language modalities
Multimodal Applications
Cross-modal Retrieval
Retrieve data across different modalities
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase