Kandinsky 3 Open-Source Text-to-Image Model: Incorporating Russian Cultural Data to Enhance Text-Image Generation Quality

Kandinsky 3

Developed by kandinsky-community

Kandinsky 3.0 is an open-source text-to-image diffusion model developed based on the Kandinsky2-x model series, incorporating more Russian culture-related data to enhance text understanding and visual generation quality.

Image Generation Open Source License:Apache-2.0 #Russian cultural image generation #High-parameter text-to-image #Multi-component diffusion model

Downloads 8,465

Release Time : 11/21/2023

Model Overview

A diffusion model capable of generating high-quality images from text descriptions, particularly skilled at creating images with Russian cultural characteristics.

Model Features

Russian Cultural Characteristics

Incorporates more training data related to Russian culture, enabling the generation of images with Russian cultural features

Large-Scale Model Architecture

Significantly increased size of text encoder and diffusion U-Net model, improving text understanding and image generation quality

Multi-Component Architecture

Adopts a three-component architecture (text encoder, diffusion U-Net, and MoVQ encoder/decoder) working collaboratively

Open-Source Model

Provides two open-source models (base and repair versions), supporting community use and improvement

Model Capabilities

Text-to-Image Generation

Image-Guided Generation

Image Inpainting

Art Style Transfer

Use Cases

Creative Design

Concept Art Creation

Generates concept artworks in various styles based on text descriptions

Can generate crochet art in the style of Alphonse Mucha, etc.

Illustration Generation

Quickly generates illustrations in specific styles

Can generate stylized illustrations like a yellow cottage by a Danish fjord

Advertising and Marketing

Ad Material Generation

Generates attractive advertising images based on product descriptions

Can generate promotional materials like movie posters for Mustang sports cars

Cultural Dissemination

Cultural Feature Image Generation

Generates images with Russian cultural characteristics

Can generate creative images incorporating Russian elements

🚀 Kandinsky-3: Text-to-image Diffusion Model

Kandinsky 3.0 is an open - source text - to - image diffusion model that builds on the Kandinsky2 - x model family, offering enhanced text understanding and visual quality, and the ability to generate pictures related to Russian culture.

Post | Generate | Telegram-bot | [Report]

Title

🚀 Quick Start

The usage details are still in progress. For now, you can refer to the following code snippets for basic operations.

✨ Features

Cultural Relevance: Kandinsky 3.0 incorporates more data related to Russian culture, enabling it to generate pictures relevant to Russian culture.
Enhanced Performance: Improvements have been made to text understanding and visual quality by increasing the size of the text encoder and Diffusion U - Net models respectively.

📦 Installation

Make sure to install diffusers from main as well as Transformers, Accelerate.

pip install git+https://github.com/huggingface/diffusers.git
pip install --upgrade transformers accelerate

💻 Usage Examples

Basic Usage

Text - 2 - Image

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

Image - 2 - Image

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]

📚 Documentation

Kandinsky 3.0 is an open - source text - to - image diffusion model built upon the Kandinsky2 - x model family. In comparison to its predecessors, it incorporates more data related to Russian culture, which allows it to generate pictures related to Russian culture. Enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U - Net models respectively.

For more information such as details of training and examples of generations, check out our post. The English version will be released in a couple of days.

🔧 Technical Details

Kandinsky Architecture

The architecture consists of three parts:

Text encoder Flan - UL2 (encoder part) - 8.6B
Latent Diffusion U - Net - 3B
MoVQ encoder/decoder - 267M

We release two models:

Base: The base text - to - image diffusion model. This model was trained over 2M steps on 400 A100.
Inpainting: The inpainting version of the model. The model was initialized from the final checkpoint of the base model and trained 250k steps on 300 A100.

📄 License

The project is licensed under the Apache - 2.0 license.

Examples of generations


"A beautiful landscape outdoors scene in the crochet knitting art style, drawing in style by Alfons Mucha"	"gorgeous phoenix, cosmic, darkness, epic, cinematic, moonlight, stars, high - definition, texture,Oscar-Claude Monet"	"a yellow house at the edge of the danish fjord, in the style of eiko ojala, ingrid baars, ad posters, mountainous vistas, george ault, realistic details, dark white and dark gray, 4k"	"dragon fruit head, upper body, realistic, illustration by Joshua Hoffine Norman Rockwell, scary, creepy, biohacking, futurism, Zaha Hadid style"

"Amazing playful nice cute strawberry character, dynamic poze, surreal fantazy garden background, gorgeous masterpice, award winning photo, soft natural lighting, 3d, Blender, Octane render, tilt - shift, deep field, colorful, I can't believe how beautiful this is, colorful, cute and sweet baby - loved photo"	"beautiful fairy - tale desert, in the sky a wave of sand merges with the milky way, stars, cosmism, digital art, 8k"	"Car, mustang, movie, person, poster, car cover, person, in the style of alessandro gottardo, gold and cyan, gerald harvey jones, reflections, highly detailed illustrations, industrial urban scenes"	"cloud in blue sky, a red lip, collage art, shuji terayama, dreamy objects, surreal, criterion collection, showa era, intricate details, mirror"

Authors

Vladimir Arkhipkin: Github
Anastasia Maltseva Github
Andrei Filatov Github
Igor Pavlov: Github
Julia Agafonova
Arseniy Shakhmatov: Github, Blog
Andrey Kuznetsov: Github, Blog
Denis Dimitrov: Github, Blog

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご