🚀 Kandinsky-3:文本到图像扩散模型
Kandinsky 3.0 是一个基于 Kandinsky2 - x 模型家族构建的开源文本到图像扩散模型。与前代模型相比,它融入了更多数据,特别是与俄罗斯文化相关的数据,能够生成与俄罗斯文化相关的图片。此外,通过增大文本编码器和扩散 U - Net 模型的规模,该模型在文本理解和视觉质量方面都得到了提升。
🚀 快速开始
若想了解更多信息,如训练细节、生成示例等,请查看我们的文章。英文版本将在几天后发布。
✨ 主要特性
- 基于 Kandinsky2 - x 模型家族构建,为开源文本到图像扩散模型。
- 融入更多与俄罗斯文化相关的数据,可生成相关图片。
- 增大文本编码器和扩散 U - Net 模型规模,提升文本理解和视觉质量。
📦 安装指南
请确保从主分支安装 diffusers
以及 Transformers 和 Accelerate:
pip install git+https://github.com/huggingface/diffusers.git
pip install --upgrade transformers accelerate
💻 使用示例
基础用法
文本到图像
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]
图像到图像
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]
🔧 技术细节
架构详情

架构由三部分组成:
- 文本编码器 Flan - UL2(编码器部分) - 86 亿参数
- 潜在扩散 U - Net - 30 亿参数
- MoVQ 编码器/解码器 - 2.67 亿参数
模型
我们发布了两个模型:
- 基础模型:基础文本到图像扩散模型。该模型在 400 张 A100 显卡上训练了 200 万步。
- 图像修复模型:模型的图像修复版本。该模型从基础模型的最终检查点初始化,并在 300 张 A100 显卡上训练了 25 万步。
📚 详细文档
生成示例
图片 |
提示词 |
 |
“A beautiful landscape outdoors scene in the crochet knitting art style, drawing in style by Alfons Mucha” |
 |
“gorgeous phoenix, cosmic, darkness, epic, cinematic, moonlight, stars, high - definition, texture,Oscar - Claude Monet” |
 |
“a yellow house at the edge of the danish fjord, in the style of eiko ojala, ingrid baars, ad posters, mountainous vistas, george ault, realistic details, dark white and dark gray, 4k” |
 |
“dragon fruit head, upper body, realistic, illustration by Joshua Hoffine Norman Rockwell, scary, creepy, biohacking, futurism, Zaha Hadid style” |
 |
“Amazing playful nice cute strawberry character, dynamic poze, surreal fantazy garden background, gorgeous masterpice, award winning photo, soft natural lighting, 3d, Blender, Octane render, tilt - shift, deep field, colorful, I can't believe how beautiful this is, colorful, cute and sweet baby - loved photo” |
 |
“beautiful fairy - tale desert, in the sky a wave of sand merges with the milky way, stars, cosmism, digital art, 8k” |
 |
“Car, mustang, movie, person, poster, car cover, person, in the style of alessandro gottardo, gold and cyan, gerald harvey jones, reflections, highly detailed illustrations, industrial urban scenes” |
 |
“cloud in blue sky, a red lip, collage art, shuji terayama, dreamy objects, surreal, criterion collection, showa era, intricate details, mirror” |
📄 许可证
本项目采用 Apache - 2.0 许可证。
👥 作者