🚀 Cool Japan Diffusion for learning 2.0 Model Card
This is a specialized model for illustrations, fine-tuned from Stable Diffusion, mainly targeting Japanese users. It aims to generate high - quality images according to text prompts.
🚀 Quick Start
If you want to have a quick try, you can enter text in the text form on the right to generate images. You can also try it on the Space created by Mr. F.
If you want to use it seriously, please read the above notes and refer to this article. The usage tutorial is here.
✨ Features
- Specialized for Illustrations: Fine - tuned from Stable Diffusion, it is specialized for generating illustrations.
- Multiple Usage Scenarios: Suitable for various scenarios such as contests, research, education, and self - expression.
📦 Installation
Diffusers
First, run the following script to install the library:
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy
💻 Usage Examples
Basic Usage
from diffusers import StableDiffusionPipeline
import torch
model_id = "alfredplpl/cool-japan-diffusion-for-learning-2-0"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "anime, a beautuful girl with black hair and red eyes, kimono, 4k, detailed"
image = pipe(prompt, height=512, width=512).images[0]
image.save("girl.png")
Advanced Usage
- Use
xformers
: It is said that using xformers can speed up the process.
- For users with limited GPU memory: Use
pipe.enable_attention_slicing()
when using GPU.
📚 Documentation
Model Details
Property |
Details |
Developers |
Robin Rombach, Patrick Esser, Alfred Increment |
Model Type |
A text - to - image generation model based on diffusion models |
Language |
Japanese |
License |
CreativeML Open RAIL++ - M - NC License |
Model Description |
This model can generate appropriate images according to prompts. The algorithms are Latent Diffusion Model and OpenCLIP - ViT/H. |
Expected Use Cases
- Contests: Submit to AI Art Grand Prix.
- Reporting on Image Generation AI: Both public broadcasters and for - profit enterprises can use it.
- Introduction of Cool Japan: Explain what Cool Japan is to people in other countries.
- Research and Development: Conduct research on prompt engineering, fine - tuning, and model merging.
- Education: Use it for graduation projects of art students and thesis projects of university students.
- Self - Expression: Express personal feelings and thoughts on SNS.
- Usage in Hugging Face Community: Ask questions in Japanese or English.
Unexpected Use Cases
- Express things as facts: Do not use the generated images to represent things as facts.
- Use in monetized content: Do not use it in monetized YouTube content.
- Direct commercial service: Do not directly provide it as a commercial service.
- Cause trouble to teachers: Do not use it in a way that troubles teachers.
- Negative impact on the creative industry: Avoid any usage that may have a negative impact on the creative industry.
Prohibited or Malicious Use Cases
- Digital forgery: Do not publish digital forgeries (Digital Forgery), especially for existing characters.
- Unauthorized Image - to - Image: Do not perform Image - to - Image on others' works without permission.
- Pornography distribution: Do not distribute pornographic materials.
- False information dissemination: Do not spread false information.
Model Limitations and Biases
- Model Limitations: Not well - understood yet.
- Biases: Similar to Stable Diffusion, there are biases such as easier generation of beautiful female images and racial biases in the output based on the location of the generated people. It can also generate realistic - looking images in some cases.
Training
Property |
Details |
Training Data |
Approximately 200,000 images of illustrations and manga posted on Twitter, obtained through the Twitter API. |
Training Process |
Fine - tuned the VAE and U - Net of Stable Diffusion. |
Hardware |
RTX 3090 |
Optimizer |
AdamW |
Gradient Accumulations |
1 |
Batch Size |
1 |
Evaluation Results
CLIP Score
CLIP Score is an indicator that shows the ability of a model to generate images according to prompts. According to the report from Izumi Satoshi:
However, according to the report on conditions, the conditions for this model are more favorable.
FID, DaFID - 512
FID is an indicator that shows the similarity between two datasets in terms of image recognition. DaFID - 512 is a specialized indicator for two - dimensional illustrations. According to the report from Birdman, when comparing 10,000 images generated by each model with the High Resolution Anime Face Dataset, the results are as shown in the image Report Results.
Environmental Impact
- Hardware Type: RTX 3090
- Usage Time: 300 hours
- Cloud Provider: None
- Training Location: Japan
- Carbon Emissions: Relatively low
References
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
⚠️ Important Note
From January 10, 2023, China will implement legal restrictions on image - generating AI. Notice
This model is mainly released for Japanese people. Therefore, the model documents are written in Japanese. They will be translated into English for foreign users after obtaining consensus from Japanese people. Thank you for your cooperation.
💡 Usage Tip
When using this model, please handle the generated products in accordance with various laws and regulations. And be aware of the potential ethical issues.
This model card is written by Alfred Increment based on Stable Diffusion v2.