AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Diffusion Model

# Multimodal Diffusion Model

Cosmos 1.0 Diffusion 7B Text2World
Other
A multimodal world foundation model based on diffusion architecture developed by NVIDIA, capable of generating high-quality physics-aware videos from text inputs
Text-to-Video
C
nvidia
5,011
220
Cogact Small
MIT
CogACT is a novel advanced Vision-Language-Action (VLA) architecture derived from Vision-Language Models (VLM), specifically designed for robot manipulation.
Multimodal Fusion Transformers English
C
CogACT
405
4
Cogact Large
MIT
CogACT is a novel advanced Vision-Language-Action (VLA) architecture derived from Vision-Language Models (VLM), specifically designed for robot manipulation.
Multimodal Fusion Transformers English
C
CogACT
122
3
Rdt 1b Test
MIT
An RDT model derived from robotics-diffusion-transformer/rdt-1b, focusing on the field of robotics.
Text-to-Image Transformers English
R
Ethan-pooh
0
0
Ldm3d 4c
Openrail
LDM3D is a latent diffusion model capable of generating both images and depth maps from text prompts, supporting 3D content creation
Text-to-Image English
L
Intel
1,086
39
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase