FFusionXL-BASE Open-Source Image Generation Model - Accelerated Training and Multilingual Support to Meet Diverse Applications

Ffusionxl BASE

Developed by FFusion

FFusionXL-BASE is a text-to-image generation model based on SDXL. It optimizes the training speed, supports multiple languages, and is suitable for various application scenarios.

Image Generation English#SDXL Optimization #Multilingual Image Generation #Fast Training

Downloads 540

Release Time : 7/27/2023

Model Overview

FFusionXL-BASE is a text-to-image generation model based on SDXL. It can generate and modify images according to text prompts. It optimizes the training speed, is suitable for various application scenarios, supports multilingual training, and has a stable and reliable architecture.

Model Features

Fast Training

Optimized for high-speed training, enabling rapid experimentation.

Versatility

Suitable for various applications and standards from natural language processing to computer vision.

Autonomous Training

Can be used as a basis to train custom models that meet your own needs.

Multilingual Support

Supports training models using multiple languages.

Robust Architecture

Built on mature technologies to ensure stability and reliability.

Model Capabilities

Text-to-Image Generation

Image Modification

Multilingual Support

Fast Training

Use Cases

Artistic Creation

Generate Artworks

Generate artworks according to text prompts, such as 'A sailing ship in a storm in the style of Leonardo da Vinci'.

Generate high-quality art images.

Design

Character Design

Generate multi - perspective designs of the same character, such as 'A beautiful cyber woman wearing a black corset and a pink latex shirt'.

Generate consistent character design images.

Image Optimization

Optimize existing images using the image-to-image generation function.

Generate higher-quality images.

🚀 FFusion/FFusionXL-BASE

FFusion/FFusionXL-BASE is a diffusion-based text-to-image generative model, optimized for high-speed training and suitable for various applications. It can generate and modify images based on text prompts.

✨ Features

🚀 Fast Training: Optimized for high-speed training, allowing rapid experimentation.
🧩 Versatility: Suitable for various applications and standards, from NLP to Computer Vision.
🎓 Train Your Way: A base for training your own models, tailored to your needs.
🌐 Multilingual Support: Train models in multiple languages.
🛡️ Robust Architecture: Built on proven technologies to ensure stability and reliability.

📦 Installation

Installing Dependencies for ONNX Inference

from optimum.onnxruntime import ORTStableDiffusionPipeline

model_id = "FFusion/FFusionXL-BASE"
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id)

Installing Dependencies for OpenVINO Inference

pip install diffusers
pip install invisible-watermark>=0.2.0

Installing Dependencies for Model Diffusers

pip install diffusers --upgrade
pip install invisible_watermark transformers accelerate safetensors

Installing Dependencies for Training

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
cd examples/text_to_image
pip install -r requirements_sdxl.txt
accelerate config

💻 Usage Examples

Basic Usage - ONNX Text-to-Image

from optimum.onnxruntime import ORTStableDiffusionPipeline

model_id = "FFusion/FFusionXL-BASE"
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipeline(prompt).images

Basic Usage - OpenVINO Text-to-Image

from optimum.intel import OVStableDiffusionXLPipeline

model_id = "FFusion/FFusionXL-BASE"
base = OVStableDiffusionXLPipeline.from_pretrained(model_id)
prompt = "train station by Caspar David Friedrich"
image = base(prompt).images[0]
image.save("train_station.png")

Basic Usage - Model Diffusers

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("FFusion/FFusionXL-09-SDXL", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")

# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt).images[0]

Advanced Usage - Training with LoRA

export MODEL_NAME="FFusion/FFusionXL-BASE"
export VAE="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"

accelerate launch train_text_to_image_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE \
  --dataset_name=$DATASET_NAME \
  --enable_xformers_memory_efficient_attention \
  --resolution=512 --center_crop --random_flip \
  --proportion_empty_prompts=0.2 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 --gradient_checkpointing \
  --max_train_steps=10000 \
  --use_8bit_adam \
  --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --report_to="wandb" \
  --validation_prompt="a cute Sundar Pichai creature" --validation_epochs 5 \
  --checkpointing_steps=5000 \
  --output_dir="sdxl-pokemon-model" \
  --push_to_hub

📚 Documentation

Model Information

Property	Details
Model Type	Diffusion-based text-to-image generative model
Trained by	FFusion AI
License	FFXL Research License
Model Description	This is a trained model based on SDXL that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
Resources for more information	SDXL paper on arXiv

ONNX Version

We are proud to announce a fully optimized Microsoft ONNX Version exclusively compatible with the latest DirectML Execution Provider. All the ONNX files are optimized (Quantization) to fp16 for fast inference and training across all devices.

The Vae_Decoder is kept at fp32 with settings:

"float16": false,
"use_gpu": true,
"keep_io_types": true,
"force_fp32_ops": ["RandomNormalLike"]

to avoid black screens and broken renders. As soon as a proper solution for a full fp16 VAE decoder arrives, we will update it. VAE encoder and everything else is fully optimized 🤟.

Our ONNX is OPTIMIZED using ONNX v8:

producer: onnxruntime.transformers 1.15.1
imports: ai.onnx v18, com.microsoft.nchwc v1, ai.onnx.ml v3, com.ms.internal.nhwc v19, ai.onnx.training v1, ai.onnx.preview.training v1, com.microsoft v1, com.microsoft.experimental v1, org.pytorch.aten v1, com.microsoft.dml v1, graph: torch_jit

Intel® OpenVINO™ Version

A converted Intel® OpenVINO™ model is also included for inference testing and training. No Quantization and optimization applied yet.

🔧 Technical Details

ONNX Details

NETRON Details:

Install NETRON

macOS: Download the .dmg file or run brew install --cask netron
Linux: Download the .AppImage file or run snap install netron
Windows: Download the .exe installer or run winget install -s winget netron
NETRON browser version: Start Text Encoder
NETRON browser version: Start Text Encoder 2
NETRON browser version: Start VAE decoder
NETRON browser version: Start VAE encoder
NETRON browser version: Start UNET

AMD Support for Microsoft® DirectML Optimization of Stable Diffusion

AMD has released support for Microsoft DirectML optimizations for Stable Diffusion, working closely with Microsoft for optimal performance on AMD devices.

📄 License

This model is released under the FFXL Research License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご