🚀 Furception v1.0
A VAE decoder finetune by Project RedRocket, leveraging e621 images for enhanced image reconstruction.
🚀 Quick Start
Furception v1.0 is a VAE decoder finetuned from stabilityai/sd-vae-ft-mse using images from e621. It employs a combination of MAE and MSE loss, calculated in the Oklab color space, to balance sharpness and smooth outputs, prioritizing more perceptually significant color channels.
✨ Features
- Noise Reduction: Effectively eliminates unwanted high - frequency noise in models trained on similar data.
- Broad Applicability: Shows improvements across all image resolutions and various art styles.
- Balanced Output: Maintains a good balance between sharpness and smoothness in image reconstruction.
Default VAE (kl - f8) |
Furception v1.0 |
 |
 |
Note that the output is overall smoother and has significantly less artifacting around edges in high - detail regions.
📚 Documentation
Licensing
This VAE is available under the terms of the [CC BY - NC - SA 4.0 Deed](https://creativecommons.org/licenses/by - nc - sa/4.0/). This applies to the use of the model, deployment, and distribution of the model weights only. The license does not apply to images decoded by this VAE. You may release them under any license, even public domain, as long as you are not creating them for commercial purposes. You are free and encouraged to distribute this VAE with models as long as you give credit and the VAE carries this license (the rest of the model does not need to share this license, although its distribution must be non - commercial). Please include the version number so people can know if they need to get an updated version in the future.
Training details
Overall training is fundamentally similar to LDM. We used the same relative base weights for MAE, MSE, and LPIPS as used in LDM and in sd - vae - ft - mse in the case of LPIPS. The discriminator's weight in the loss objective is dynamically set so that the gradient norm for the discriminator is half that of the reconstruction loss, just like LDM. We used a similar discriminator to what LDM uses, except reparameterized to Wasserstein loss with a gradient penalty and with its group norm layers replaced with layer norms.
Training for version 1.0 used random square crops at various levels of downscales (Lanczos with antialiasing), randomly rotated and flipped. Training ran for 150,000 steps at a batch size of 32. EMA weights were accumulated using a similar decay to sd - vae - ft - mse scaled for our batch size and are the release version of the model.
🔧 Technical Details
The model is trained with a mixture of MAE and MSE loss, calculated in the Oklab color space. This approach prioritizes image reconstruction based on the perceptual significance of color channels, leading to better - balanced outputs. The discriminator's weight in the loss objective is dynamically adjusted to ensure a proper balance with the reconstruction loss.
📄 License
This VAE is licensed under the [CC BY - NC - SA 4.0 Deed](https://creativecommons.org/licenses/by - nc - sa/4.0/).
Credits
- Development and research lead by @drhead.
- With research and development assistance by @RedHotTensors.
- And additional research assistance by @lodestones and Thessalo.
- Dataset curation by @lodestones and Bannanapuncakes, with additional curation by @RedHotTensors.
- And thanks to dogarrowtype for system administration assistance.
Based on
- CompVis Latent Diffusion: https://github.com/CompVis/latent - diffusion/
- StabilityAI sd - vae - ft - mse: https://huggingface.co/stabilityai/sd - vae - ft - mse
- LPIPS by Richard Zhang, et al: https://github.com/richzhang/PerceptualSimilarity
- OkLab by Björn Ottosson: https://bottosson.github.io/posts/oklab/
- fine - tune - models by Jonathan Chang: https://github.com/cccntu/fine - tune - models/
Built on
- Flax by Google Brain: https://github.com/google/flax
- And Huggingface Diffusers: https://github.com/huggingface/diffusers
With deep thanks to the innumerable artists who released their works to the public for fair use in this non - commercial research project.