Furception_vae Open-source Image Decoder - Optimize Image Reconstruction Quality and Improve Image Effects for Free

Furception Vae

Developed by RedRocket

This is a VAE decoder fine-tuned based on stabilityai/sd-vae-ft-mse, specifically optimized for e621 image data. It employs a mixed loss function of MAE and MSE, calculating the loss in the Oklab color space to enhance image reconstruction quality.

Image Generation #High-frequency noise suppression #Oklab color optimization #Anime style enhancement

Downloads 26

Release Time : 1/12/2024

Model Overview

Furception v1.0 is a VAE decoder focused on image-to-image conversion tasks, particularly optimized for performance on solid-color images. It effectively reduces high-frequency noise and edge artifacts, making it suitable for processing images in various artistic styles.

Model Features

Optimized color space processing

Calculates loss in the Oklab color space, prioritizing perceptually more important color channels to improve image reconstruction quality.

Mixed loss function

Combines MAE and MSE losses to balance clarity and smooth output, reducing high-frequency noise.

Broad adaptability to artistic styles

Due to training data encompassing various artistic styles, this VAE exhibits certain generalization capabilities across multiple artistic styles.

Model Capabilities

Image reconstruction

Noise reduction

Edge artifact elimination

Color optimization

Use Cases

Artistic creation

Anime image optimization

Used to optimize anime-style images, reducing high-frequency noise and edge artifacts.

Output images are smoother, with significantly reduced artifacts in detailed areas.

Image processing

Low-resolution image enhancement

Improves the quality of low-resolution generated images.

Improvements are observed across all resolutions, with more pronounced effects at lower resolutions.

🚀 Furception v1.0

A VAE decoder finetune by Project RedRocket, leveraging e621 images for enhanced image reconstruction.

🚀 Quick Start

Furception v1.0 is a VAE decoder finetuned from stabilityai/sd-vae-ft-mse using images from e621. It employs a combination of MAE and MSE loss, calculated in the Oklab color space, to balance sharpness and smooth outputs, prioritizing more perceptually significant color channels.

✨ Features

Noise Reduction: Effectively eliminates unwanted high - frequency noise in models trained on similar data.
Broad Applicability: Shows improvements across all image resolutions and various art styles.
Balanced Output: Maintains a good balance between sharpness and smoothness in image reconstruction.

Default VAE (kl - f8)	Furception v1.0

Note that the output is overall smoother and has significantly less artifacting around edges in high - detail regions.

📚 Documentation

Licensing

This VAE is available under the terms of the [CC BY - NC - SA 4.0 Deed](https://creativecommons.org/licenses/by - nc - sa/4.0/). This applies to the use of the model, deployment, and distribution of the model weights only. The license does not apply to images decoded by this VAE. You may release them under any license, even public domain, as long as you are not creating them for commercial purposes. You are free and encouraged to distribute this VAE with models as long as you give credit and the VAE carries this license (the rest of the model does not need to share this license, although its distribution must be non - commercial). Please include the version number so people can know if they need to get an updated version in the future.

Training details

Overall training is fundamentally similar to LDM. We used the same relative base weights for MAE, MSE, and LPIPS as used in LDM and in sd - vae - ft - mse in the case of LPIPS. The discriminator's weight in the loss objective is dynamically set so that the gradient norm for the discriminator is half that of the reconstruction loss, just like LDM. We used a similar discriminator to what LDM uses, except reparameterized to Wasserstein loss with a gradient penalty and with its group norm layers replaced with layer norms.

Training for version 1.0 used random square crops at various levels of downscales (Lanczos with antialiasing), randomly rotated and flipped. Training ran for 150,000 steps at a batch size of 32. EMA weights were accumulated using a similar decay to sd - vae - ft - mse scaled for our batch size and are the release version of the model.

🔧 Technical Details

The model is trained with a mixture of MAE and MSE loss, calculated in the Oklab color space. This approach prioritizes image reconstruction based on the perceptual significance of color channels, leading to better - balanced outputs. The discriminator's weight in the loss objective is dynamically adjusted to ensure a proper balance with the reconstruction loss.

📄 License

This VAE is licensed under the [CC BY - NC - SA 4.0 Deed](https://creativecommons.org/licenses/by - nc - sa/4.0/).

Credits

Development and research lead by @drhead.
With research and development assistance by @RedHotTensors.
And additional research assistance by @lodestones and Thessalo.
Dataset curation by @lodestones and Bannanapuncakes, with additional curation by @RedHotTensors.
And thanks to dogarrowtype for system administration assistance.

Based on

CompVis Latent Diffusion: https://github.com/CompVis/latent - diffusion/
StabilityAI sd - vae - ft - mse: https://huggingface.co/stabilityai/sd - vae - ft - mse
LPIPS by Richard Zhang, et al: https://github.com/richzhang/PerceptualSimilarity
OkLab by Björn Ottosson: https://bottosson.github.io/posts/oklab/
fine - tune - models by Jonathan Chang: https://github.com/cccntu/fine - tune - models/

Built on

Flax by Google Brain: https://github.com/google/flax
And Huggingface Diffusers: https://github.com/huggingface/diffusers

With deep thanks to the innumerable artists who released their works to the public for fair use in this non - commercial research project.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご