# Autoregressive Multimodal
Vila U 7b 256
MIT
VILA-U is a foundational model that unifies vision-language understanding and generation tasks, achieving efficient multimodal processing through a single autoregressive framework.
Text-to-Image
V
mit-han-lab
127
21
Janus 1.3B
MIT
Janus is a novel autoregressive framework that unifies multimodal understanding and generation. By decoupling visual encoding, it addresses the limitations of previous methods and enhances the flexibility of the framework.
Text-to-Image
Transformers

J
deepseek-ai
12.44k
588
Anole 7b V0.1 Hf
Apache-2.0
Anole is an open-source autoregressive multimodal model capable of generating interleaved image-text sequences without relying on stable diffusion technology.
Text-to-Image
Transformers English

A
leloy
22.83k
8
Featured Recommended AI Models