Kandinsky 3开源文生图模型 - 融入俄文化数据，提升图文生成质量

首页

Kandinsky 3

由 kandinsky-community 开发

Kandinsky 3.0是基于Kandinsky2-x模型系列开发的开源文生图扩散模型，融合了更多俄罗斯文化相关数据，提升了文本理解与视觉生成质量。

图像生成开源协议:Apache-2.0 #俄罗斯文化图像生成 #高参数文生图 #多组件扩散模型

下载量 8,465

发布时间 : 11/21/2023

模型简介

一个能够根据文本描述生成高质量图像的扩散模型，特别擅长生成具有俄罗斯文化特色的图像内容。

模型特点

俄罗斯文化特色

融合了更多与俄罗斯文化相关的训练数据，能够生成具有俄罗斯文化特色的图像

大规模模型架构

文本编码器和扩散U-Net模型规模显著增大，提升了文本理解和图像生成质量

多组件架构

采用三组件架构(文本编码器、扩散U-Net和MoVQ编码器/解码器)协同工作

开源模型

提供基础版和修复版两个开源模型，支持社区使用和改进

模型能力

文本生成图像

图像引导生成

图像修复

艺术风格转换

使用案例

创意设计

概念艺术创作

根据文字描述生成各种风格的概念艺术作品

可生成阿尔丰斯·慕夏风格的钩针编织艺术等

插画生成

快速生成符合特定风格的插画作品

可生成丹麦峡湾边的黄色小屋等风格化插画

广告营销

广告素材生成

根据产品描述生成吸引人的广告图像

可生成野马跑车电影海报等宣传素材

文化传播

文化特色图像生成

生成具有俄罗斯文化特色的图像内容

可生成融合俄罗斯元素的创意图像

🚀 Kandinsky-3：文本到图像扩散模型

Kandinsky 3.0 是一个基于 Kandinsky2 - x 模型家族构建的开源文本到图像扩散模型。与前代模型相比，它融入了更多数据，特别是与俄罗斯文化相关的数据，能够生成与俄罗斯文化相关的图片。此外，通过增大文本编码器和扩散 U - Net 模型的规模，该模型在文本理解和视觉质量方面都得到了提升。

🚀 快速开始

若想了解更多信息，如训练细节、生成示例等，请查看我们的文章。英文版本将在几天后发布。

✨ 主要特性

基于 Kandinsky2 - x 模型家族构建，为开源文本到图像扩散模型。
融入更多与俄罗斯文化相关的数据，可生成相关图片。
增大文本编码器和扩散 U - Net 模型规模，提升文本理解和视觉质量。

📦 安装指南

请确保从主分支安装 diffusers 以及 Transformers 和 Accelerate：

pip install git+https://github.com/huggingface/diffusers.git
pip install --upgrade transformers accelerate

💻 使用示例

基础用法

文本到图像

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

图像到图像

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]

🔧 技术细节

架构详情

架构图

架构由三部分组成：

文本编码器 Flan - UL2（编码器部分） - 86 亿参数
潜在扩散 U - Net - 30 亿参数
MoVQ 编码器/解码器 - 2.67 亿参数

模型

我们发布了两个模型：

基础模型：基础文本到图像扩散模型。该模型在 400 张 A100 显卡上训练了 200 万步。
图像修复模型：模型的图像修复版本。该模型从基础模型的最终检查点初始化，并在 300 张 A100 显卡上训练了 25 万步。

📚 详细文档

生成示例

图片	提示词
	“A beautiful landscape outdoors scene in the crochet knitting art style, drawing in style by Alfons Mucha”
	“gorgeous phoenix, cosmic, darkness, epic, cinematic, moonlight, stars, high - definition, texture,Oscar - Claude Monet”
	“a yellow house at the edge of the danish fjord, in the style of eiko ojala, ingrid baars, ad posters, mountainous vistas, george ault, realistic details, dark white and dark gray, 4k”
	“dragon fruit head, upper body, realistic, illustration by Joshua Hoffine Norman Rockwell, scary, creepy, biohacking, futurism, Zaha Hadid style”
	“Amazing playful nice cute strawberry character, dynamic poze, surreal fantazy garden background, gorgeous masterpice, award winning photo, soft natural lighting, 3d, Blender, Octane render, tilt - shift, deep field, colorful, I can't believe how beautiful this is, colorful, cute and sweet baby - loved photo”
	“beautiful fairy - tale desert, in the sky a wave of sand merges with the milky way, stars, cosmism, digital art, 8k”
	“Car, mustang, movie, person, poster, car cover, person, in the style of alessandro gottardo, gold and cyan, gerald harvey jones, reflections, highly detailed illustrations, industrial urban scenes”
	“cloud in blue sky, a red lip, collage art, shuji terayama, dreamy objects, surreal, criterion collection, showa era, intricate details, mirror”