Nexus-Gen Open-Source AI Model - A Practical Tool Combining Language Reasoning and Image Generation Functions

Nexus Gen

Developed by modelscope

Nexus-Gen is a unified model that combines the linguistic reasoning capabilities of large language models with the image generation capabilities of diffusion models

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Multimodal Image Generation #Language-Driven Editing #Autoregressive Embedding Prediction

Downloads 129

Release Time : 4/30/2025

Model Overview

Through a dual-stage alignment training process, Nexus-Gen achieves alignment between the embedding spaces of large language models and diffusion models, possessing integrated capabilities to comprehensively address image understanding, generation, and editing tasks.

Model Features

Dual-Stage Alignment Training

Learns to predict image embeddings through an autoregressive large language model, then reconstructs high-fidelity images from these embeddings via a visual decoder

Prefilled Autoregressive Strategy

Innovatively uses special tokens with positional encoding instead of continuous embeddings to prefill input sequences, solving error accumulation issues

Multi-Task Integration

A unified model with simultaneous capabilities for image understanding, generation, and editing

Model Capabilities

Image Understanding

Image Generation

Image Editing

Multimodal Input Processing

Use Cases

Creative Design

Text-to-Image Generation

Generates high-quality images based on detailed prompts

Produces high-fidelity images that match textual descriptions

Image Processing

Image Editing

Modifies and optimizes existing images

Achieves precise editing while maintaining image quality

🚀 Nexus-Gen

Nexus-Gen is a unified model that combines the language reasoning capabilities of LLMs with the image synthesis power of diffusion models, enabling comprehensive handling of image understanding, generation, and editing tasks.

🚀 Quick Start

📦 Installation

Install DiffSynth-Studio from source:

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

Install requirements

pip install -r requirements.txt

Install ms-swift if you want to perform finetuning on Nexus-Gen.

pip install ms-swift -U

Prepare models

python download_models.py

💻 Usage Examples

Basic Usage

Image Understanding

python image_understanding.py

Image Generation

Image generation with detailed prompt.

python image_generation.py

Polish prompt and generate images with Nexus-Gen.

image_generation_with_selfpolish.py

Image Editing

python image_editing.py

Advanced Usage

Nexus-Gen is trained base on ms-swift and DiffSynth-Studio. You can find the training scripts in train/scripts/train_decoder.sh and train_llm.sh.

📚 Documentation

What is the Nexus-Gen

Nexus-Gen is a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models. To align the embedding space of the LLM and diffusion model, we conduct a dual - phase alignment training process. (1) The autoregressive LLM learns to predict image embeddings conditioned on multimodal inputs, while (2) the vision decoder is trained to reconstruct high - fidelity images from these embeddings. During training the LLM, we identified a critical discrepancy between the autoregressive paradigm's training and inference phases, where error accumulation in continuous embedding space severely degrades generation quality. To avoid this issue, we introduce a prefilled autoregression strategy that prefills input sequence with position - embedded special tokens instead of continuous embeddings. Through dual - phase training, Nexus - Gen has developed the integrated capability to comprehensively address the image understanding, generation and editing tasks as follows.

More information please refer to our repo: https://github.com/modelscope/Nexus - Gen.git

cover architecture

📄 License

This project is under the Apache-2.0 license.

Property	Details
Library Name	transformers
Pipeline Tag	image - to - image
Frameworks	Pytorch
Tasks	any - to - any

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご