StoryMaker Open-Source Story Creation Model - Good Consistency of Multi-Role Scene Images, Easily Create Stories

Storymaker

Developed by RED-AIGC

StoryMaker is a personalized solution capable of maintaining consistency in facial features, clothing, hairstyles, and body characteristics across multi-character scenarios, designed for creating stories composed of a series of images.

Text-to-Image EnglishOpen Source License:Apache-2.0 #Multi-character Consistency #Story Image Generation #Personalized Customization

Downloads 262

Release Time : 9/2/2024

Model Overview

StoryMaker is a text-to-image generation solution focused on maintaining character consistency in multi-character scenarios, suitable for story creation and diverse image generation.

Model Features

Multi-character Consistency

Maintains consistency in facial features, clothing, hairstyles, and body characteristics across multi-character scenarios.

Story Creation

Capable of creating stories composed of a series of images, suitable for generating continuous scenes.

Personalized Adaptation

Achieves personalized image generation through facial encoders and adapter technology.

Model Capabilities

Text-to-Image Generation

Multi-character Consistency Generation

Story Scene Generation

Personalized Image Generation

Use Cases

Story Creation

A Day in the Life of an Office Worker

Generates continuous scene images depicting a day in the life of an office worker.

The first three rows of images show different scenes from the office worker's day.

Movie Clip

Generates story clip images from the movie 'Before Sunrise'.

The last two rows of images depict scenes from the movie.

Dual Portrait Synthesis

Dual Portrait

Generates dual portrait images while maintaining character consistency.

Demonstrates the generation effect of dual portraits.

Diverse Applications

Diverse Image Generation

Generates images of diverse scenes suitable for various application needs.

Shows the generation effects of multiple scenes.

🚀 StoryMaker: Towards consistent characters in text-to-image generation

StoryMaker is a personalization solution that preserves not only the consistency of faces but also clothing, hairstyles, and bodies in multi - character scenes. It enables the potential to create a story composed of a series of images.

[![GitHub](https://img.shields.io/github/stars/RedAIGC/StoryMaker?style=social)](https://github.com/RedAIGC/StoryMaker)

Visualization of generated images by StoryMaker. The first three rows tell a story about a day in the life of an "office worker", and the last two rows tell a story about a movie of "Before Sunrise".

✨ Features

🔍 Two Portraits Synthesis

🌟 Diverse application

📦 Installation

Download the model

You can directly download the model from Huggingface.

If you cannot access Huggingface, you can use [hf - mirror](https://hf - mirror.com/) to download models.

export HF_ENDPOINT=https://hf - mirror.com
huggingface-cli download --resume - download RED - AIGC/StoryMaker --local - dir checkpoints --local - dir - use - symlinks False

Download the face encoder

For the face encoder, you need to manually download it via this [URL](https://github.com/deepinsight/insightface/issues/1896#issuecomment - 1023867304) to models/buffalo_l as the default link is invalid. Once you have prepared all models, the folder tree should be like:

  .
  ├── models
  ├── checkpoints/mask.bin
  ├── pipeline_sdxl_storymaker.py
  └── README.md

💻 Usage Examples

Basic Usage

# !pip install opencv - python transformers accelerate insightface
import diffusers

import cv2
import torch
import numpy as np
from PIL import Image

from insightface.app import FaceAnalysis
from pipeline_sdxl_storymaker import StableDiffusionXLStoryMakerPipeline

# prepare 'buffalo_l' under ./models
app = FaceAnalysis(name='buffalo_l', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

# prepare models under ./checkpoints
face_adapter = f'./checkpoints/mask.bin'
image_encoder_path = 'laion/CLIP - ViT - H - 14 - laion2B - s32B - b79K'  #  from https://huggingface.co/laion/CLIP - ViT - H - 14 - laion2B - s32B - b79K

base_model = 'huaquan/YamerMIX_v11'  # from https://huggingface.co/huaquan/YamerMIX_v11
pipe = StableDiffusionXLStoryMakerPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.float16
)
pipe.cuda()

# load adapter
pipe.load_storymaker_adapter(image_encoder_path, face_adapter, scale=0.8, lora_scale=0.8)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

Advanced Usage

# load an image and mask
face_image = Image.open("examples/ldh.png").convert('RGB')
mask_image = Image.open("examples/ldh_mask.png").convert('RGB')
    
face_info = app.get(cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR))
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1] # only use the maximum face

prompt = "a person is taking a selfie, the person is wearing a red hat, and a volcano is in the distance"
n_prompt = "bad quality, NSFW, low quality, ugly, disfigured, deformed"

generator = torch.Generator(device='cuda').manual_seed(666)
for i in range(4):
    output = pipe(
        image=image, mask_image=mask_image, face_info=face_info,
        prompt=prompt,
        negative_prompt=n_prompt,
        ip_adapter_scale=0.8, lora_scale=0.8,
        num_inference_steps=25,
        guidance_scale=7.5,
        height=1280, width=960,
        generator=generator,
    ).images[0]
    output.save(f'examples/results/ldh666_new_{i}.jpg')

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Acknowledgements

Our work is highly inspired by [IP - Adapter](https://github.com/tencent - ailab/IP - Adapter) and [InstantID](https://github.com/instantX - research/InstantID). Thanks for their great works!
Thanks Yamer for developing YamerMIX, we use it as the base model in our demo.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご