ð Stable Diffusion v1-5
Stable Diffusionã¯ãä»»æã®ããã¹ãå
¥åãå
ã«ãåçã®ããã«ãªã¢ã«ãªç»åãçæã§ããæœåšçãªããã¹ãããç»åãžã®æ¡æ£ã¢ãã«ã§ãããã®ã¢ãã«ã¯ãããã¹ãã«åºã¥ãç»åçæã®ç ç©¶ãã¢ãŒãäœåã®åµäœãªã©ãæ§ã
ãªçšéã«å©çšã§ããŸãã
ð ã¯ã€ãã¯ã¹ã¿ãŒã
ãã®ãªããžããªã¯ãThe CreativeML OpenRAIL - M Licenseã«åºã¥ããŠHuggingFaceã«åã¢ããããŒããããŠããŸããå
ã®ãã¡ã€ã«ã¯[modelscope](https://www.modelscope.cn/models/AI - ModelScope/stable - diffusion - v1 - 5)ããååŸããããã§ãã¯ãµã ã«ãããã¡ã€ã«ã®æŽåæ§ãæ€èšŒãããŠããŸãã
Stable Diffusion v1 - 5ã䜿çšããã«ã¯ãð§šDiffusersã©ã€ãã©ãªãå©çšã§ããŸãã
âš äž»ãªæ©èœ
- ä»»æã®ããã¹ãå
¥åããåçã®ããã«ãªã¢ã«ãªç»åãçæã§ããŸãã
- æœåšçãªæ¡æ£ã¢ãã«ãå©çšããŠãããããå¹ççãªç»åçæãå¯èœã§ãã
- ç°ãªããã§ãã¯ãã€ã³ãã䜿çšããããšã§ãçæãããç»åã®è³ªãç¹æ§ã調æŽã§ããŸãã
ðŠ ã€ã³ã¹ããŒã«
Diffusersã©ã€ãã©ãªã䜿çšããã«ã¯ã以äžã®ã³ãã³ãã§ã€ã³ã¹ããŒã«ã§ããŸãã
pip install diffusers
ð» 䜿çšäŸ
åºæ¬çãªäœ¿çšæ³
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"benjamin - paine/stable - diffusion - v1 - 5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
é«åºŠãªäœ¿çšæ³
ãã詳现ãªäœ¿ç𿹿³ãJAXã§ã®äœ¿çšäŸã«ã€ããŠã¯ã[ãã¡ã](https://github.com/huggingface/diffusers#text - to - image - generation - with - stable - diffusion)ã®æç€ºã«åŸã£ãŠãã ããã
ð ããã¥ã¡ã³ã
ã¢ãã«è©³çް
屿§ |
诊æ
|
éçºè
|
Robin Rombach, Patrick Esser |
ã¢ãã«ã¿ã€ã |
æ¡æ£ããŒã¹ã®ããã¹ãããç»åãžã®çæã¢ãã« |
èšèª |
è±èª |
ã©ã€ã»ã³ã¹ |
[The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable - diffusion - license) 㯠[Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming - convention - of - responsible - ai - licenses) ã§ãBigScience ãš the RAIL Initiative ãå
±åã§è¡ã£ãŠãã責任ããAIã©ã€ã»ã³ã¹ã®åéã®äœæ¥ããé©å¿ãããŠããŸãã |
ã¢ãã«èª¬æ |
ãã®ã¢ãã«ã¯ãããã¹ãããã³ããã«åºã¥ããŠç»åãçæããã³å€æŽããããã«äœ¿çšã§ããŸããLatent Diffusion Model ã§ãImagen paper ã§ææ¡ãããŠããããã«ãåºå®ãããäºååŠç¿æžã¿ã®ããã¹ããšã³ã³ãŒã (CLIP ViT - L/14) ã䜿çšããŠããŸãã |
詳现æ
å ±ã®ãªãœãŒã¹ |
[GitHub Repository](https://github.com/CompVis/stable - diffusion), Paper |
ã¢ãã«ã®äœ¿çš
çŽæ¥äœ¿çš
ãã®ã¢ãã«ã¯ç ç©¶ç®çã®ã¿ãæå³ããŠããŸããå¯èœãªç ç©¶åéãã¿ã¹ã¯ã«ã¯ä»¥äžã®ãã®ãå«ãŸããŸãã
- æå®³ãªã³ã³ãã³ããçæããå¯èœæ§ã®ããã¢ãã«ã®å®å
šãªå±éã
- çæã¢ãã«ã®å¶éããã€ã¢ã¹ã®èª¿æ»ãšçè§£ã
- ã¢ãŒãäœåã®çæããã¶ã€ã³ããã®ä»ã®ã¢ãŒãããã»ã¹ã§ã®äœ¿çšã
- æè²ãåµé çãªããŒã«ã§ã®å¿çšã
- çæã¢ãã«ã«é¢ããç ç©¶ã
誀çšãæªçšãããã³ç¯å²å€ã®äœ¿çš
ãã®ã¢ãã«ã¯ã人ã
ã«æµå¯ŸçãŸãã¯ç倿ãäžããç°å¢ãäœãåºãç»åãæå³çã«äœæãŸãã¯æ¡æ£ããããã«äœ¿çšããŠã¯ãªããŸãããããã«ã¯ã人ã
ãäžå¿«ãèŠçããŸãã¯äžå¿«æãæããã§ãããç»åããæŽå²çãŸãã¯çŸåšã®ã¹ãã¬ãªã¿ã€ããåºããã³ã³ãã³ãã®çæãå«ãŸããŸãã
å¶éãšãã€ã¢ã¹
å¶é
- ã¢ãã«ã¯å®å
šãªåçã®ãããªãªã¢ãªãºã ãéæã§ããŸããã
- ã¢ãã«ã¯èªã¿åãå¯èœãªããã¹ããã¬ã³ããªã³ã°ã§ããŸããã
- ã¢ãã«ã¯ããéãçã®äžã«èµ€ãç«æ¹äœãã®ãããªæ§ææ§ãå«ãé£ããã¿ã¹ã¯ã§ã¯ããŸãæ©èœããŸããã
- é¡ã人ç©å
šäœãé©åã«çæãããªãå ŽåããããŸãã
- ã¢ãã«ã¯äž»ã«è±èªã®ãã£ãã·ã§ã³ã§åŠç¿ãããŠãããä»ã®èšèªã§ã¯ããŸãæ©èœããŸããã
- ã¢ãã«ã®ãªãŒããšã³ã³ãŒãéšåã¯æå€±ããããŸãã
- ã¢ãã«ã¯å€§èŠæš¡ãªããŒã¿ã»ãã [LAION - 5B](https://laion.ai/blog/laion - 5b/) ã§åŠç¿ãããŠãããæäººåãã®çŽ æãå«ãŸããŠããããã远å ã®å®å
šã¡ã«ããºã ãèæ
®ãªãã«ã¯è£œåã§ã®äœ¿çšã«é©ããŠããŸããã
- ããŒã¿ã»ããã®éè€æé€ã«è¿œå ã®å¯Ÿçã¯åãããŠããŸããããã®çµæãåŠç¿ããŒã¿ã«éè€ããç»åã«ã€ããŠããçšåºŠã®èšæ¶ãèŠãããŸããåŠç¿ããŒã¿ã¯ [https://rom1504.github.io/clip - retrieval/](https://rom1504.github.io/clip - retrieval/) ã§æ€çŽ¢ã§ããèšæ¶ãããç»åã®æ€åºã«åœ¹ç«ã€å¯èœæ§ããããŸãã
ãã€ã¢ã¹
ç»åçæã¢ãã«ã®èœåã¯å°è±¡çã§ããã瀟äŒçãªãã€ã¢ã¹ã匷åãŸãã¯æªåãããå¯èœæ§ããããŸããStable Diffusion v1ã¯ãäž»ã«è±èªã®èª¬æã«éå®ãããç»åãããªã [LAION - 2B(en)](https://laion.ai/blog/laion - 5b/) ã®ãµãã»ããã§åŠç¿ãããŠããŸããä»ã®èšèªã䜿çšããã³ãã¥ããã£ãæåã®ããã¹ããç»åã¯ãååã«èæ
®ãããŠããªãå¯èœæ§ããããŸããããã¯ãã¢ãã«ã®å
šäœçãªåºåã«åœ±é¿ãäžããçœäººãè¥¿æŽæåãããã©ã«ããšããŠèšå®ãããããšãå€ãã§ããããã«ãéè±èªã®ããã³ããã§ã®ã³ã³ãã³ãçæèœåã¯ãè±èªã®ããã³ãããšæ¯èŒããŠèããå£ããŸãã
å®å
šã¢ãžã¥ãŒã«
ãã®ã¢ãã«ã®æå³ãããäœ¿çšæ¹æ³ã¯ãDiffusersã® Safety Checker ã䜿çšããããšã§ãããã®ãã§ãã«ãŒã¯ãã¢ãã«ã®åºåãæ¢ç¥ã®ããŒãã³ãŒããããNSFWæŠå¿µãšæ¯èŒããããšã§æ©èœããŸãããããã®æŠå¿µã¯ããã®ãã£ã«ã¿ãŒã®éãšã³ãžãã¢ãªã³ã°ã®å¯èœæ§ãæžããããã«æå³çã«é ãããŠããŸããå
·äœçã«ã¯ããã§ãã«ãŒã¯ç»åçæåŸã® CLIPTextModel
ã®åã蟌ã¿ç©ºéã§æå®³ãªæŠå¿µã®ã¯ã©ã¹ç¢ºçãæ¯èŒããŸãããããã®æŠå¿µã¯ãçæãããç»åãšãšãã«ã¢ãã«ã«æž¡ãããåNSFWæŠå¿µã«å¯Ÿããæåã§èšèšãããéã¿ãšæ¯èŒãããŸãã
ð§ æè¡è©³çް
åŠç¿ããŒã¿
ã¢ãã«éçºè
ã¯ã以äžã®ããŒã¿ã»ãããã¢ãã«ã®åŠç¿ã«äœ¿çšããŸããã
- LAION - 2B (en) ããã³ãã®ãµãã»ãã
åŠç¿æé
Stable Diffusion v1 - 5ã¯ããªãŒããšã³ã³ãŒããšæ¡æ£ã¢ãã«ãçµã¿åãããæœåšçãªæ¡æ£ã¢ãã«ã§ããªãŒããšã³ã³ãŒãã®æœåšç©ºéã§åŠç¿ãããŸããåŠç¿äžã¯ä»¥äžã®æé ãè¡ãããŸãã
- ç»åã¯ãšã³ã³ãŒããéããŠãšã³ã³ãŒããããæœåšè¡šçŸã«å€æãããŸãããªãŒããšã³ã³ãŒãã¯çžå¯ŸçãªããŠã³ãµã³ããªã³ã°ä¿æ°8ã䜿çšãã圢ç¶ã H x W x 3 ã®ç»åã圢ç¶ã H/f x W/f x 4 ã®æœåšè¡šçŸã«ãããã³ã°ããŸãã
- ããã¹ãããã³ããã¯ViT - L/14ããã¹ããšã³ã³ãŒããéããŠãšã³ã³ãŒããããŸãã
- ããã¹ããšã³ã³ãŒãã®éããŒã«åºåã¯ãã¯ãã¹ã¢ãã³ã·ã§ã³ãä»ããŠæœåšçãªæ¡æ£ã¢ãã«ã®UNetããã¯ããŒã³ã«å
¥åãããŸãã
- æå€±ã¯ãæœåšè¡šçŸã«è¿œå ããããã€ãºãšUNetã«ããäºæž¬ãšã®éã®åæ§æç®çã§ãã
åŠç¿ãã©ã¡ãŒã¿
- ããŒããŠã§ã¢: 32 x 8 x A100 GPUs
- ãªããã£ãã€ã¶: AdamW
- åŸé
环ç©: 2
- ããã: 32 x 8 x 2 x 4 = 2048
- åŠç¿ç: 10,000ã¹ãããã§0.0001ãŸã§ãŠã©ãŒã ã¢ãããããã®åŸäžå®ã«ä¿ãããŸãã
è©äŸ¡çµæ
ç°ãªãåé¡åšããªãŒã¬ã€ãã³ã¹ã¹ã±ãŒã« (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) ãš50 PNDM/PLMSãµã³ããªã³ã°ã¹ãããã䜿çšããè©äŸ¡ã§ã¯ããã§ãã¯ãã€ã³ãã®çžå¯Ÿçãªæ¹åã瀺ãããŠããŸãã

COCO2017æ€èšŒã»ããããã®10000åã®ã©ã³ãã ãªããã³ããã䜿çšããŠã50 PLMSã¹ãããã§è©äŸ¡ããã512x512è§£å床ã§è©äŸ¡ãããŸãããFIDã¹ã³ã¢ã«æé©åãããŠããŸããã
ç°å¢ãžã®åœ±é¿
Stable Diffusion v1 æšå®æåºé
ãã®æ
å ±ã«åºã¥ããŠãLacoste et al. (2019) ã§æç€ºããã Machine Learning Impact calculator ã䜿çšããŠã以äžã®CO2æåºéãæšå®ããŠããŸããããŒããŠã§ã¢ãå®è¡æéãã¯ã©ãŠããããã€ããããã³ã³ã³ãã¥ãŒããªãŒãžã§ã³ãå©çšããŠãççŽ åœ±é¿ãæšå®ããŠããŸãã
- ããŒããŠã§ã¢ã¿ã€ã: A100 PCIe 40GB
- äœ¿çšæé: 150000æé
- ã¯ã©ãŠããããã€ã: AWS
- ã³ã³ãã¥ãŒããªãŒãžã§ã³: US - east
- æåºãããççŽ (é»åæ¶è²» x æé x é»åç¶²ã®äœçœ®ã«åºã¥ãççŽ çæ): 11250 kg CO2 eq.
ð ã©ã€ã»ã³ã¹
ãã®ã¢ãã«ã¯ [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable - diffusion - license) ã®äžã§æäŸãããŠããŸãã
åŒçš
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High - Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684 - 10695}
}