🚀 Kandinsky-3:文本到圖像擴散模型
Kandinsky 3.0 是一個基於 Kandinsky2 - x 模型家族構建的開源文本到圖像擴散模型。與前代模型相比,它融入了更多數據,特別是與俄羅斯文化相關的數據,能夠生成與俄羅斯文化相關的圖片。此外,通過增大文本編碼器和擴散 U - Net 模型的規模,該模型在文本理解和視覺質量方面都得到了提升。
🚀 快速開始
若想了解更多信息,如訓練細節、生成示例等,請查看我們的文章。英文版本將在幾天後發佈。
✨ 主要特性
- 基於 Kandinsky2 - x 模型家族構建,為開源文本到圖像擴散模型。
- 融入更多與俄羅斯文化相關的數據,可生成相關圖片。
- 增大文本編碼器和擴散 U - Net 模型規模,提升文本理解和視覺質量。
📦 安裝指南
請確保從主分支安裝 diffusers
以及 Transformers 和 Accelerate:
pip install git+https://github.com/huggingface/diffusers.git
pip install --upgrade transformers accelerate
💻 使用示例
基礎用法
文本到圖像
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]
圖像到圖像
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]
🔧 技術細節
架構詳情

架構由三部分組成:
- 文本編碼器 Flan - UL2(編碼器部分) - 86 億參數
- 潛在擴散 U - Net - 30 億參數
- MoVQ 編碼器/解碼器 - 2.67 億參數
模型
我們發佈了兩個模型:
- 基礎模型:基礎文本到圖像擴散模型。該模型在 400 張 A100 顯卡上訓練了 200 萬步。
- 圖像修復模型:模型的圖像修復版本。該模型從基礎模型的最終檢查點初始化,並在 300 張 A100 顯卡上訓練了 25 萬步。
📚 詳細文檔
生成示例
圖片 |
提示詞 |
 |
“A beautiful landscape outdoors scene in the crochet knitting art style, drawing in style by Alfons Mucha” |
 |
“gorgeous phoenix, cosmic, darkness, epic, cinematic, moonlight, stars, high - definition, texture,Oscar - Claude Monet” |
 |
“a yellow house at the edge of the danish fjord, in the style of eiko ojala, ingrid baars, ad posters, mountainous vistas, george ault, realistic details, dark white and dark gray, 4k” |
 |
“dragon fruit head, upper body, realistic, illustration by Joshua Hoffine Norman Rockwell, scary, creepy, biohacking, futurism, Zaha Hadid style” |
 |
“Amazing playful nice cute strawberry character, dynamic poze, surreal fantazy garden background, gorgeous masterpice, award winning photo, soft natural lighting, 3d, Blender, Octane render, tilt - shift, deep field, colorful, I can't believe how beautiful this is, colorful, cute and sweet baby - loved photo” |
 |
“beautiful fairy - tale desert, in the sky a wave of sand merges with the milky way, stars, cosmism, digital art, 8k” |
 |
“Car, mustang, movie, person, poster, car cover, person, in the style of alessandro gottardo, gold and cyan, gerald harvey jones, reflections, highly detailed illustrations, industrial urban scenes” |
 |
“cloud in blue sky, a red lip, collage art, shuji terayama, dreamy objects, surreal, criterion collection, showa era, intricate details, mirror” |
📄 許可證
本項目採用 Apache - 2.0 許可證。
👥 作者