4

4M 21 XL

Developed by EPFL-VILAB
4M is a framework for training 'any-to-any' multimodal foundation models, extending to various modalities through tokenization and masking techniques.
Downloads 57
Release Time : 6/12/2024

Model Overview

The foundation model trained by the 4M framework can perform a wide range of visual tasks, exhibits strong transfer capabilities, and serves as a flexible and controllable multimodal generation model.

Model Features

Any-to-Any Multimodal Conversion
Supports conversion between dozens of different modalities.
Strong Transfer Capability
Can effectively transfer to unseen tasks and modalities.
Flexible and Controllable Generation
Highly flexible and controllable as a multimodal generation model.

Model Capabilities

Multimodal Masked Modeling
Visual Task Processing
Multimodal Generation
Cross-Modal Conversion

Use Cases

Computer Vision
Image Generation
Generate images from other modalities (e.g., text, depth maps, etc.)
Multimodal Processing
Cross-Modal Conversion
Convert between different visual and language modalities
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase