Lumina-Next-SFT-diffusers Open-Source Text-to-Image Model - Freely Generate Beautiful Images, with Practical Features!

Lumina Next SFT Diffusers

Developed by Alpha-VLLM

Lumina-Next-SFT is a 2-billion-parameter Next-DiT model that uses Gemma-2B as the text encoder and is enhanced through high-quality supervised fine-tuning (SFT) for text-to-image generation.

Text-to-Image Open Source License:Apache-2.0 #Text-to-image diffusion model #Gemma-2B text encoder #2 billion parameter scale

Downloads 8,442

Release Time : 6/20/2024

Model Overview

Lumina-Next-SFT is a text-to-image diffusion model based on the Next-DiT architecture, utilizing Gemma-2B as the text encoder to generate high-quality images from text descriptions.

Model Features

High-quality supervised fine-tuning

Enhanced model performance through high-quality supervised fine-tuning (SFT), improving the quality of generated images.

Efficient architecture

Utilizes Next-DiT backbone for faster image generation with lower memory consumption.

Powerful text understanding

Employs Gemma-2B as the text encoder, providing superior text comprehension capabilities.

High-resolution support

Supports image generation up to 2K resolution.

Model Capabilities

Text-to-image generation

High-resolution image generation

Complex scene understanding

Use Cases

Creative design

Concept art creation

Generate concept art for games or movies based on text descriptions.

Produces concept artwork with specific styles and details.

Content creation

Social media content generation

Generate accompanying images for social media posts.

Quickly generates visual images that match the text content.

🚀 Lumina-Next-SFT

The Lumina-Next-SFT is a Next-DiT model with 2B parameters. It uses Gemma-2B as the text encoder and is enhanced through high-quality supervised fine-tuning (SFT). This model offers a powerful solution for text-to-image generation tasks.

🚀 Quick Start

Installation

1. Create a conda environment and install PyTorch

Note: You may want to adjust the CUDA version according to your driver version.

conda create -n Lumina_T2X -y
conda activate Lumina_T2X
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y

2. Install dependencies

pip install diffusers huggingface_hub

3. Install `flash-attn`

pip install flash-attn --no-build-isolation

Inference

1. Prepare the pre-trained model

⭐⭐ (Recommended) you can use huggingface_cli to download our model:

huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-SFT-diffusers --local-dir /path/to/ckpt

2. Run with demo code:

from diffusers import LuminaText2ImgPipeline
import torch

pipeline = LuminaText2ImgPipeline.from_pretrained("/path/to/ckpt/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")

# or you can download the model using code directly
# pipeline = LuminaText2ImgPipeline.from_pretrained("Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")

image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. "
                        "Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]

✨ Features

The Lumina-Next-SFT model has the following features:

Powerful Backbone: It uses Next-DiT as the backbone for efficient image generation.
High - quality Text Encoder: The Gemma 2B model serves as the text encoder, enabling better understanding of text prompts.
Fine - tuned VAE: The VAE is a version of sdxl fine - tuned by stabilityai, enhancing the quality of generated images.

📦 Model Information

Property	Details
Model Type	Text-to-Image
Model Architecture	Next-DiT
Text Encoder	Gemma-2B
VAE	stabilityai/sdxl-vae
Training Data	JourneyDB/JourneyDB
Library Name	diffusers

📰 News

[2024-07-08] 🎉🎉🎉 Lumina-Next is now supported in the diffusers! Thanks to @yiyixuxu and @sayakpaul!
[2024-06-08] 🎉🎉🎉 We have released the Lumina-Next-SFT model.
[2024-05-28] We updated the Lumina-Next-T2I model to support 2K Resolution image generation.
[2024-05-16] We have converted the .pth weights to .safetensors weights. Please pull the latest code to use demo.py for inference.
[2024-05-12] We release the next version of Lumina-T2I, called Lumina-Next-T2I for faster and lower memory usage image generation model.