Kandinsky 3開源文生圖模型 - 融入俄文化數據，提升圖文生成質量

首頁

Kandinsky 3

由kandinsky-community開發

Kandinsky 3.0是基於Kandinsky2-x模型系列開發的開源文生圖擴散模型，融合了更多俄羅斯文化相關數據，提升了文本理解與視覺生成質量。

圖像生成開源協議:Apache-2.0 #俄羅斯文化圖像生成 #高參數文生圖 #多組件擴散模型

下載量 8,465

發布時間 : 11/21/2023

模型概述

一個能夠根據文本描述生成高質量圖像的擴散模型，特別擅長生成具有俄羅斯文化特色的圖像內容。

模型特點

俄羅斯文化特色

融合了更多與俄羅斯文化相關的訓練數據，能夠生成具有俄羅斯文化特色的圖像

大規模模型架構

文本編碼器和擴散U-Net模型規模顯著增大，提升了文本理解和圖像生成質量

多組件架構

採用三組件架構(文本編碼器、擴散U-Net和MoVQ編碼器/解碼器)協同工作

開源模型

提供基礎版和修復版兩個開源模型，支持社區使用和改進

模型能力

文本生成圖像

圖像引導生成

圖像修復

藝術風格轉換

使用案例

創意設計

概念藝術創作

根據文字描述生成各種風格的概念藝術作品

可生成阿爾豐斯·慕夏風格的鉤針編織藝術等

插畫生成

快速生成符合特定風格的插畫作品

可生成丹麥峽灣邊的黃色小屋等風格化插畫

廣告營銷

廣告素材生成

根據產品描述生成吸引人的廣告圖像

可生成野馬跑車電影海報等宣傳素材

文化傳播

文化特色圖像生成

生成具有俄羅斯文化特色的圖像內容

可生成融合俄羅斯元素的創意圖像

🚀 Kandinsky-3：文本到圖像擴散模型

Kandinsky 3.0 是一個基於 Kandinsky2 - x 模型家族構建的開源文本到圖像擴散模型。與前代模型相比，它融入了更多數據，特別是與俄羅斯文化相關的數據，能夠生成與俄羅斯文化相關的圖片。此外，通過增大文本編碼器和擴散 U - Net 模型的規模，該模型在文本理解和視覺質量方面都得到了提升。

🚀 快速開始

若想了解更多信息，如訓練細節、生成示例等，請查看我們的文章。英文版本將在幾天後發佈。

✨ 主要特性

基於 Kandinsky2 - x 模型家族構建，為開源文本到圖像擴散模型。
融入更多與俄羅斯文化相關的數據，可生成相關圖片。
增大文本編碼器和擴散 U - Net 模型規模，提升文本理解和視覺質量。

📦 安裝指南

請確保從主分支安裝 diffusers 以及 Transformers 和 Accelerate：

pip install git+https://github.com/huggingface/diffusers.git
pip install --upgrade transformers accelerate

💻 使用示例

基礎用法

文本到圖像

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

圖像到圖像

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]

🔧 技術細節

架構詳情

架構圖

架構由三部分組成：

文本編碼器 Flan - UL2（編碼器部分） - 86 億參數
潛在擴散 U - Net - 30 億參數
MoVQ 編碼器/解碼器 - 2.67 億參數

模型

我們發佈了兩個模型：

基礎模型：基礎文本到圖像擴散模型。該模型在 400 張 A100 顯卡上訓練了 200 萬步。
圖像修復模型：模型的圖像修復版本。該模型從基礎模型的最終檢查點初始化，並在 300 張 A100 顯卡上訓練了 25 萬步。

📚 詳細文檔

生成示例

圖片	提示詞
	“A beautiful landscape outdoors scene in the crochet knitting art style, drawing in style by Alfons Mucha”
	“gorgeous phoenix, cosmic, darkness, epic, cinematic, moonlight, stars, high - definition, texture,Oscar - Claude Monet”
	“a yellow house at the edge of the danish fjord, in the style of eiko ojala, ingrid baars, ad posters, mountainous vistas, george ault, realistic details, dark white and dark gray, 4k”
	“dragon fruit head, upper body, realistic, illustration by Joshua Hoffine Norman Rockwell, scary, creepy, biohacking, futurism, Zaha Hadid style”
	“Amazing playful nice cute strawberry character, dynamic poze, surreal fantazy garden background, gorgeous masterpice, award winning photo, soft natural lighting, 3d, Blender, Octane render, tilt - shift, deep field, colorful, I can't believe how beautiful this is, colorful, cute and sweet baby - loved photo”
	“beautiful fairy - tale desert, in the sky a wave of sand merges with the milky way, stars, cosmism, digital art, 8k”
	“Car, mustang, movie, person, poster, car cover, person, in the style of alessandro gottardo, gold and cyan, gerald harvey jones, reflections, highly detailed illustrations, industrial urban scenes”
	“cloud in blue sky, a red lip, collage art, shuji terayama, dreamy objects, surreal, criterion collection, showa era, intricate details, mirror”