Doohickey-Mega Open-Source Image Synthesis Model - Supports High-Resolution Image Synthesis with Multiple Resolution Ratios

Doohickey Mega

Developed by doohickey

A stable diffusion model series optimized for high-resolution image synthesis, fine-tuned based on Stable Diffusion v1-5, supporting multiple aspect ratios

Image Generation #High-resolution image synthesis #Multi-aspect ratio support #Hand detail optimization

Downloads 186

Release Time : 11/12/2022

Model Overview

A text-to-image generation model fine-tuned from runwayml/stable-diffusion-v1-5, specially optimized for image synthesis quality around 768x768 resolution, supporting multiple output aspect ratios

Model Features

High-resolution optimization

Specially fine-tuned for resolutions around 768x768 to generate high-quality images

Multi-aspect ratio support

Supports various output ratios from 640x640 to 768x768 (e.g., 768x640/704x768)

Hand detail optimization

Versions v3-6000 and later specifically optimize hand detail performance

Improved CLIP model

Version v3 uses the laion/CLIP-ViT-L-14-laion2B-s32B-b82K model with synchronized fine-tuning

Model Capabilities

Text-to-image generation

High-resolution image synthesis

Multi-aspect ratio image generation

Use Cases

Creative design

Concept art creation

Generate high-resolution concept art images based on text descriptions

High-quality artwork at 768x768 resolution

Digital illustration generation

Quickly generate illustrations in various styles

Professional-grade illustrations supporting multiple aspect ratios

Commercial applications

Advertising material generation

Quickly generate high-quality visual materials for marketing campaigns

HD images ready for commercial use

🚀 High-Resolution Image Synthesis Models

These models are better suited for high-resolution image synthesis. The main model (doohickey/doohickey-mega) has been fine-tuned from runwayml/stable-diffusion-v1-5 at a resolution close to 768x768. The suggested method for generating images from the model is using Doohickey.

✨ Features

Current Models

Name	Description	Datasets Used
`doohickey/doohickey-mega/v1-3000steps.ckpt`	First attempt, excellent for high-definition, but poor results with aspect ratios other than 1:1. Trained at 704x704.	A-1k
`doohickey/doohickey-mega/v2-3000steps.ckpt`	Similar to the previous one but performs worse.	A-1k + ~1k samples from LAION-2b-En-Aesthetic >=768x768
`doohickey/doohickey-mega/v3-3000.ckpt`	Utilizes a new CLIP model (laion/CLIP-ViT-L-14-laion2B-s32B-b82K) (the CLIP model was also fine-tuned for 3k steps). Models from this point onwards were trained with various aspect ratios, with a minimum resolution of 640x640 and a maximum of 768x768 (e.g., 768x640 or 704x768).	A-1k + E-10k
`doohickey/doohickey-mega/v3-6000.ckpt`	3k additional steps on top of `v3-3000.ckpt`, performs better on hands! (Only the UNet was fine-tuned, with a 50% RandomHorizontalFlip operation added).	A-1k
`doohickey/doohickey-mega/v3-7000.ckpt`	Continuation of the previous model. I thought Colab would crash after 3k steps, but it continued and saved checkpoints every 1k steps.	A-1k
`doohickey/doohickey-mega/v3-8000.ckpt`	As described in the previous model, `v3-6000` + 2k steps.	A-1k

The currently loaded model for diffusers is doohickey/doohickey-mega/v3-8000.ckpt.

Datasets

Property	Details
A-1K	1k scraped images, captioned with BLIP (more refined aesthetic)
E-10k	10k scraped images captioned with BLIP (less refined aesthetic)

📄 License

⚠️ Important Note

Limitations and Biases from Stable Diffusion also apply to this model.

This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies:

You can't use the model to deliberately produce nor share illegal or harmful outputs or content.
The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license.
You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully).

Please read the full license carefully here: https://huggingface.co/spaces/CompVis/stable-diffusion-license

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご