Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Cool Japan Diffusion 2.1.1 Model Card
Cool Japan Diffusion is a model fine - tuned on Stable Diffusion, specializing in expressing Cool Japan elements such as anime, manga, and games.
Note: China will implement legal restrictions on image - generating AI. (Warning for people in China)
The English version is here.
🚀 Quick Start
Cool Japan Diffusion is a model fine - tuned on Stable Diffusion, specifically designed to represent Cool Japan elements in anime, manga, games, etc. Note that it has no particular relation to the Cabinet Office's Cool Japan Strategy.
📄 License
Regarding the license, it's the original CreativeML Open RAIL++ - M License with the addition of a non - commercial use prohibition (except for exceptions). The reason for adding the non - commercial use prohibition (except for exceptions) is the concern that it may have a negative impact on the creative industry. If this concern is dispelled, the next version will revert to the original license, allowing commercial use. By the way, the Japanese translation of the original license can be found here. People in for - profit enterprises should consult with their legal department. Those using it for personal interest generally don't need to worry too much as long as they follow common sense. As stated in the license, if you modify this model, you need to inherit this license.
📚 Documentation
Legal and Ethical Considerations
This model was created in Japan, so Japanese laws apply. The author claims that the training of this model is legal based on Article 30 - 4 of the Copyright Law. Also, regarding the distribution of this model, the author claims that it does not fall under the category of principal offenders or accessory offenders in light of the Copyright Law and Article 175 of the Criminal Code. For more details, please refer to the opinion of lawyer Kakinuma. However, as stated in the license, please handle the products of this model in accordance with various laws and regulations.
The author believes that the act of distributing this model is not ethically good because the permission of the copyright holders of the training works was not obtained. However, legally, the permission of the copyright holders is not required for training, and there is no legal problem, similar to search engines. Therefore, please consider that this distribution also serves the purpose of investigating the ethical aspects rather than just the legal ones.
Usage
If you want to have a quick and easy experience, please use this Space. The detailed instructions on how to handle this model are written here. You can download the model here.
Model Details
Property | Details |
---|---|
Developer | Robin Rombach, Patrick Esser, Alfred Increment |
Model Type | Diffusion - model - based text - to - image generation model |
Language | Japanese |
License | CreativeML Open RAIL++ - M - NC License |
Model Description | This model can generate appropriate images according to prompts. The algorithms are Latent Diffusion Model and OpenCLIP - ViT/H. |
Supplementary Notes | |
References | @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj"orn}, title = {High - Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684 - 10695} } |
💻 Usage Examples
Basic Usage
It's used in the same way as Stable Diffusion v2. There are many methods, and here are two patterns:
- Web UI
- Diffusers
Web UI
Please create according to the instructions.
Diffusers
Use the 🤗's Diffusers library. First, run the following script to install the library:
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy
Then, run the following script to generate an image:
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import torch
model_id = "aipicasso/cool-japan-diffusion-2-1-1"
scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "anime, masterpiece, a portrait of a girl, good pupil, 4k, detailed"
negative_prompt="deformed, blurry, bad anatomy, bad pupil, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, bad hands, fused fingers, messy drawing, broken legs censor, low quality, mutated hands and fingers, long body, mutation, poorly drawn, bad eyes, ui, error, missing fingers, fused fingers, one hand with more than 5 fingers, one hand with less than 5 fingers, one hand with more than 5 digit, one hand with less than 5 digit, extra digit, fewer digits, fused digit, missing digit, bad digit, liquid digit, long body, uncoordinated body, unnatural body, lowres, jpeg artifacts, 3d, cg, text, japanese kanji"
images = pipe(prompt,negative_prompt=negative_prompt, num_inference_steps=20).images
images[0].save("girl.png")
⚠️ Important Note
Using xformers seems to speed up the process.
If you have limited GPU memory when using a GPU, please use
pipe.enable_attention_slicing()
.
Expected Use Cases
- Contests
- Submission to AI Art Grand Prix. We will disclose all the data used for fine - tuning and let the judges determine if the review criteria are met. If you have any requests for the contest, please contact me on Hugging Face's Community.
- Reports on Image - generating AI
- It's possible not only for public broadcasters but also for for - profit enterprises. This is because we believe that the "right to know" information about image - synthesizing AI will not have a negative impact on the creative industry, and we respect the freedom of the press.
- Introduction of Cool Japan
- Explain what Cool Japan is to people from other countries. Alfred Increment feels that many international students in Japan are attracted by Cool Japan but are often disappointed to find that what they thought was "cool" in Japan is not considered so in reality. People from other countries should be more proud of their own cultures that others admire.
- Research and Development
- Use of the model on Discord
- Prompt engineering
- Fine - tuning (also known as additional training), such as DreamBooth
- Merging with other models
- Compatibility between the Latent Diffusion Model and Cool Japan
- Investigating the performance of this model using metrics like FID
- Checking the independence of this model from models other than Stable Diffusion using checksums or hash functions
- Use of the model on Discord
- Education
- Graduation projects of art college students and vocational school students
- Graduation theses and assignment projects of university students
- Teachers can use it to convey the current situation of image - generating AI
- Self - expression
- Expressing one's emotions and thoughts on SNS
- Use cases described in Hugging Face's Community
- Please ask questions in Japanese or English.
Unexpected Use Cases
- Representing things as facts.
- Using it in monetized content on platforms like YouTube.
- Directly providing it as a commercial service.
- Doing things that would trouble teachers.
- Other actions that would have a negative impact on the creative industry.
Prohibited or Malicious Use Cases
- Do not publish digital forgeries (Digital Forgery) (there is a risk of violating the Copyright Law).
- In particular, do not publish existing characters (there is a risk of violating the Copyright Law). It seems that characters that were not trained can also be generated (this tweet itself is permitted for research purposes).
- Do not perform Image - to - Image operations on others' works without permission (there is a risk of violating the Copyright Law).
- Do not distribute pornographic materials (there is a risk of violating Article 175 of the Criminal Code).
- Do not violate the so - called industry etiquette.
- Do not state non - factual things as facts (there is a risk of being charged with the crime of interfering with business operations). This includes fake news.
🔧 Technical Details
Model Limitations
- Not well - understood.
Biases
It has the same biases as Stable Diffusion. Please be careful.
Training
Property | Details |
---|---|
Training Data | - For VAE: 600,000 types of data that comply with Japanese domestic laws, excluding unauthorized re - posting sites like Danbooru (an infinite number of images can be created through data augmentation). - For U - Net: 1 million pairs of data that comply with Japanese domestic laws, excluding unauthorized re - posting sites like Danbooru. |
Training Process | Fine - tuned the VAE and U - Net of Stable Diffusion. - Hardware: RTX 3090, A6000 - Optimizer: AdamW - Gradient Accumulations: 1 - Batch Size: 1 |
Evaluation Results
There is no specific evaluation result provided.
Environmental Impact
There is almost no impact.
- Hardware Type: RTX 3090, A6000
- Usage Time (in hours): 600
- Cloud Service Provider: None
- Training Location: Japan
- Carbon Emissions: Not much
References
@InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj"orn}, title = {High - Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684 - 10695} }
*This model card was written by Alfred Increment based on Stable Diffusion v2.