Mmada 8B MixCoT
MMaDA is a novel class of multimodal diffusion foundation models, excelling in various domains such as text reasoning, multimodal understanding, and text-to-image generation.
Downloads 601
Release Time : 6/1/2025
Model Overview
MMaDA adopts a unified diffusion architecture, combining a mixed-length chain-of-thought fine-tuning strategy and a unified reinforcement learning algorithm to enhance performance in multimodal tasks.
Model Features
Unified Diffusion Architecture
Employs shared probabilistic formulations and modality-agnostic designs, eliminating the need for modality-specific components.
Mixed-Length Chain-of-Thought Fine-Tuning Strategy
Curates a unified chain-of-thought format across modalities to enhance instruction-following capabilities and chain-of-thought generation performance.
Unified Reinforcement Learning Algorithm
Utilizes the UniGRPO algorithm to unify the post-training processes for reasoning and generation tasks, ensuring continuous performance improvement.
Model Capabilities
Text reasoning
Multimodal understanding
Text-to-image generation
Use Cases
Text Processing
Complex Text Reasoning
Handles complex text tasks requiring multi-step reasoning
More stable chain-of-thought generation performance
Multimodal Tasks
Cross-Modal Understanding
Simultaneously processes and understands text and image information
Better multimodal understanding capabilities
Content Generation
Text-to-Image Generation
Generates high-quality images based on text descriptions
High-quality image generation results
Featured Recommended AI Models