ArtAug-lora-FLUX.1dev-v1 Open-source Model - Enhance the Aesthetic Quality and Visual Effects of Generated Images

Artaug Lora FLUX.1dev V1

Developed by ECNU-CILab

A LoRA model trained for FLUX.1-dev to enhance the aesthetic quality of generated images, including rich details, beautiful lighting, aesthetic composition, and clear visual effects.

Image Generation Open Source License:Apache-2.0 #Aesthetic Enhancement #Image Quality Improvement #Multimodal Interaction Training

Downloads 26

Release Time : 12/3/2024

Model Overview

This LoRA model employs differential training techniques to learn the differences between pre- and post-enhanced images, integrating the aesthetic understanding of Qwen2-VL-72B into FLUX.1[dev], thereby improving the quality of generated images.

Model Features

Aesthetic Enhancement

Enhances the details, lighting, composition, and visual effects of generated images.

No Trigger Words Needed

Directly use the model to achieve aesthetic enhancement without special trigger words.

Differential Training

Uses differential training techniques to learn the differences between pre- and post-enhanced images.

Multimodal Understanding Integration

Incorporates the aesthetic understanding of Qwen2-VL-72B into the image generation process.

Model Capabilities

High-Quality Image Generation

Aesthetic Enhancement

Detail Enhancement

Lighting Optimization

Use Cases

Creative Art

Romantic Scene Generation

Generate romantic scene images with aesthetic composition.

A romantic scene of lovers embracing under moonlight, with a tranquil lake and mountains in the background.

Landscape Image Generation

Generate landscape images with rich details and beautiful lighting.

Landscape images with professional photography quality.

Digital Art Creation

Concept Art

Generate high-quality concept art for games or movies.

Professional-level concept art pieces.

🚀 FLUX Aesthetics Enhancement LoRA

This is a LoRA model designed for FLUX.1-dev, which significantly enhances the aesthetic quality of images generated by the base model. Key improvements include rich details, beautiful lighting and shadows, aesthetic composition, and clear visuals. Notably, this model operates without the need for any trigger words.

Paper: https://arxiv.org/abs/2412.12888
Github: https://github.com/modelscope/DiffSynth-Studio
Model: ModelScope, HuggingFace
Demo: ModelScope, HuggingFace (Coming soon)

🚀 Quick Start

This model is trained using DiffSynth-Studio. We recommend users to use DiffSynth-Studio for inference.

Installation

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

Usage Examples

Basic Usage

import torch
from diffsynth import ModelManager, FluxImagePipeline, download_customized_models

lora_path = download_customized_models(
    model_id="DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1",
    origin_file_path="merged_lora.safetensors",
    local_dir="models/lora"
)[0]
model_manager = ModelManager(torch_dtype=torch.bfloat16, device="cuda", model_id_list=["FLUX.1-dev"])
model_manager.load_lora(lora_path, lora_alpha=1.0)
pipe = FluxImagePipeline.from_model_manager(model_manager)

image = pipe(prompt="a house", seed=0)
image.save("image_artaug.jpg")

Since this model is encapsulated in the universal FLUX LoRA format, it can be loaded by most LoRA loaders, allowing you to integrate this LoRA model into your own workflow.

✨ Features

Enhanced Aesthetics: Significantly improves the aesthetic quality of generated images, including rich details, beautiful lighting and shadows, aesthetic composition, and clear visuals.
No Trigger Words: This model does not require any trigger words for image generation.

📚 Documentation

Methodology

The ArtAug project is inspired by reasoning approaches like GPT-o1, which rely on model interaction and self-correction. We developed a framework aimed at enhancing the capabilities of image generation models through interaction with image understanding models. The training process of ArtAug consists of the following steps:

Synthesis-Understanding Interaction: After generating an image using the image generation model, we employ a multimodal large language model (Qwen2-VL-72B) to analyze the image content and provide suggestions for modifications, which then lead to the regeneration of a higher quality image.
Data Generation and Filtering: Interactive generation involves long inference times and sometimes produce poor image content. Therefore, we generate a large batch of image pairs offline, filter them, and use them for subsequent training.
Differential Training: We apply differential training techniques to train a LoRA model, enabling it to learn the differences between images before and after enhancement, rather than directly training on the dataset of enhanced images.
Iterative Enhancement: The trained LoRA model is fused into the base model, and the entire process is repeated multiple times with the fused model until the interaction algorithm no longer provides significant enhancements. The LoRA models produced in each iteration are combined to produce this final model.

This model integrates the aesthetic understanding of Qwen2-VL-72B into FLUX.1[dev], leading to an improvement in the quality of generated images.