Model Selection

Unified Transformer Architecture

# Unified Transformer Architecture

Emu3 is a multimodal model developed by the Beijing Academy of Artificial Intelligence, trained solely by predicting the next token, supporting image, text, and video processing.

Emu3 VisionTokenizer

Emu3 is a novel multimodal model suite trained solely through next-token prediction, surpassing multiple specialized models in both generative and perceptual tasks

Oneformer Coco Dinat Large

A unified single Transformer architecture for image segmentation, supporting three major tasks: semantic segmentation, instance segmentation, and panoptic segmentation

Image Segmentation

Oneformer Cityscapes Swin Large

The first multi-task universal image segmentation framework, supporting semantic/instance/panoptic segmentation tasks with a single model

Image Segmentation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase