# Unified Transformer Architecture
Emu3 Stage1
Apache-2.0
Emu3 is a multimodal model developed by the Beijing Academy of Artificial Intelligence, trained solely by predicting the next token, supporting image, text, and video processing.
Text-to-Image
Transformers

E
BAAI
1,359
26
Emu3 VisionTokenizer
Apache-2.0
Emu3 is a novel multimodal model suite trained solely through next-token prediction, surpassing multiple specialized models in both generative and perceptual tasks
Text-to-Image
Transformers

E
BAAI
19.82k
58
Oneformer Coco Dinat Large
MIT
A unified single Transformer architecture for image segmentation, supporting three major tasks: semantic segmentation, instance segmentation, and panoptic segmentation
Image Segmentation
Transformers

O
shi-labs
38
7
Oneformer Cityscapes Swin Large
MIT
The first multi-task universal image segmentation framework, supporting semantic/instance/panoptic segmentation tasks with a single model
Image Segmentation
Transformers

O
shi-labs
1,784
2
Featured Recommended AI Models