Open-source ControlNet-Segment-Anything Model - Precise Control of Stable Diffusion Image Generation with Segmentation Maps

Controlnet Segment Anything

Developed by mfidabel

ControlNet model trained on Stable Diffusion v1.5, enabling precise image generation control through segmentation maps

Image Generation EnglishOpen Source License:Openrail #Segmentation Map Controlled Generation #Artistic Style Rendering #Architectural Scene Generation

Downloads 33

Release Time : 5/1/2023

Model Overview

This model combines ControlNet's conditional control technology to generate images that meet requirements based on text prompts and input segmentation maps, especially suitable for artistic creation and design scenarios.

Model Features

Segmentation Map Conditional Control

Uses input segmentation maps as generation templates to achieve precise image composition control

Artistic Style Transfer

Can transform input segmentation maps into images of various artistic styles, such as Van Gogh style

High-Quality Image Generation

Supports generating high-quality images at 4K resolution, suitable for professional design needs

Model Capabilities

Image Generation

Style Transfer

Image Editing

Artistic Creation

Use Cases

Interior Design

Modern Living Room Design

Generates modern living room renderings in different styles based on segmentation maps

Can produce realistic or artistic living room renderings

Artistic Creation

Masterpiece Style Transfer

Transforms architectural segmentation maps into artistic styles like Van Gogh's 'Starry Night'

Generates architectural images with specific artistic styles

🚀 ControlNet - mfidabel/controlnet-segment-anything

These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning, enabling image generation based on text prompts and segmentation maps.

🚀 Quick Start

These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following.

prompt: contemporary living room of a house

negative prompt: low quality images_0)

prompt: new york buildings, Vincent Van Gogh starry night

negative prompt: low quality, monochrome images_1)

prompt: contemporary living room, high quality, 4k, realistic

negative prompt: low quality, monochrome, low res images_2)

✨ Features

Generate images based on text prompts and segmentation maps.
Support various types of prompts and negative prompts to control image generation.

📚 Documentation

Model Details

Property	Details
Model Type	Diffusion-based text-to-image generation model with ControlNet conditioning
Language(s)	English
License	The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
Model Description	This model is used to generate images based on a text prompt and a segmentation map as a template for the generated images

Limitations and Bias

⚠️ Important Note

The model can't render text.

Landscapes with fewer segments tend to render better.

Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it).

Some generated images can be over saturated.

Shorter prompts usually work better, as long as it makes sense with the input segmentation map.

The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset.

Training

Training Data

This model was trained using a Segmented dataset based on the COYO-700M Dataset. Stable Diffusion v1.5 checkpoint was used as the base model for the controlnet.

You can obtain the Segmentation Map of any Image through this Colab:

The model was trained as follows:

25k steps with the SAM-COYO-2k dataset
28k steps with the SAM-COYO-2.5k dataset
38k steps with the SAM-COYO-3k dataset

In that particular order.

Training Details

Property	Details
Hardware	Google Cloud TPUv4-8 VM
Optimizer	AdamW
Train Batch Size	2 x 4 = 8
Learning rate	0.00001 constant
Gradient Accumulation Steps	1
Resolution	512

Environmental Impact

Based on the Machine Learning Emissions Calculator with the following characteristics:

Hardware Type: TPUv3 Chip (TPUv4 wasn't available yet at the time of calculating)
Training Hours: 8 hours
Cloud Provider: Google Cloud Platform
Compute Region: us-central1
Carbon Emitted (Power consumption x Time x Carbon Produced Based on the Local Power Grid): 283W x 8h = 2.26 kWh x 0.57 kg eq. CO2/kWh = 1.29 kg eq. CO2

📄 License

The model is released under the CreativeML OpenRAIL M license. The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご