StableMaterials Open-Source Material Generation Tool - Create High-Resolution Seamless Textures with Text or Image Prompts

Stablematerials

Developed by gvecchio

StableMaterials is a diffusion model-based physically based rendering (PBR) material generation tool capable of generating high-resolution, tileable material maps from text or image prompts.

Image Generation EnglishOpen Source License:Openrail #PBR Material Generation #Tileable Textures #Multi-channel Output

Downloads 635

Release Time : 6/12/2024

Model Overview

This model combines semi-supervised learning with latent diffusion models (LDM) to simultaneously infer diffuse and specular properties, as well as material mesostructure (height, normal).

Model Features

Semi-supervised Learning

Combines labeled and unlabeled data for training, leveraging adversarial training to distill knowledge from large-scale pre-trained image generation models.

Knowledge Distillation

Incorporates unlabeled texture samples generated by SDXL models into the training process to bridge gaps between different data distributions.

Latent Consistency

Employs latent consistency models for rapid generation, reducing the number of inference steps required for high-quality output.

Feature Rolling

Innovative tileability technique achieved by rolling feature maps in each convolutional and attention layer of the U-Net architecture.

Model Capabilities

Generate PBR materials

Generate tileable material maps

Support text prompt generation

Support image prompt generation

Simultaneously generate diffuse and specular properties

Use Cases

Computer Graphics

Video Game Development

Generate high-quality, realistic PBR materials for game scenes and characters

Enhance game visual effects and development efficiency

Architectural Visualization

Generate realistic material textures for architectural rendering

Improve architectural visualization effects

Digital Content Creation

Provide diverse material options for 3D art creation

Enrich digital art creation resources

🚀 StableMaterials

StableMaterials is a diffusion-based model crafted for generating photorealistic physical-based rendering (PBR) materials. It combines semi-supervised learning with Latent Diffusion Models (LDMs) to create high-resolution, tileable material maps from text or image prompts. This model can infer both diffuse (Basecolor) and specular (Roughness, Metallic) properties, along with the material mesostructure (Height, Normal). 🌟

For more details, visit the project page or read the full paper on arXiv.

⚠️ Important Note

This repo contains the weight and the pipeline code for the base model in both the LDM and LCM verisons. The refiner model, along with its pipeline and the inpainting pipeline, will be released shortly.

✨ Features

🧩 Base Model

The base model generates low-resolution (512x512) material maps using a compression VAE (Variational Autoencoder) followed by a latent diffusion process. The architecture is based on the MatFuse adaptation of the LDM paradigm, optimized for material map generation with a focus on diversity and high visual fidelity. 🖼️

🔑 Key Features

Semi-Supervised Learning: The model is trained using both annotated and unannotated data, leveraging adversarial training to distill knowledge from large-scale pretrained image generation models. 📚
Knowledge Distillation: Incorporates unannotated texture samples generated using the SDXL model into the training process, bridging the gap between different data distributions. 🌐
Latent Consistency: Employs a latent consistency model to facilitate fast generation, reducing the inference steps required to produce high-quality outputs. ⚡
Feature Rolling: Introduces a novel tileability technique by rolling feature maps for each convolutional and attention layer in the U-Net architecture. 🎢

🚀 Quick Start

StableMaterials is designed for generating high-quality, realistic PBR materials for applications in computer graphics, such as video game development, architectural visualization, and digital content creation. The model supports both text and image-based prompting, allowing for versatile and intuitive material generation. 🕹️🏛️📸

💻 Usage Examples

Basic Usage

from diffusers import DiffusionPipeline
from diffusers.utils import load_image

# Load pipeline enabling the execution of custom code
pipe = DiffusionPipeline.from_pretrained(
    "gvecchio/StableMaterials", 
    trust_remote_code=True, 
    torch_dtype=torch.float16
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe = pipe.to(device)

# Text prompt example
material = pipe(
  prompt="Old rusty metal bars with peeling paint",
  guidance_scale=10.0,
  tileable=True,
  num_images_per_prompt=1,
  num_inference_steps=50,
).images[0]

# Image prompt example
material = pipe(
  prompt=load_image("path/to/input_image.jpg"),
  guidance_scale=10.0,
  tileable=True,
  num_images_per_prompt=1,
  num_inference_steps=50,
).images[0]

# The output will include basecolor, normal, height, roughness, and metallic maps
basecolor = material.basecolor
normal = material.normal
height = material.height
roughness = material.roughness
metallic = material.metallic

Advanced Usage

from diffusers import DiffusionPipeline, LCMScheduler, UNet2DConditionModel
from diffusers.utils import load_image

# Load LCM distilled unet
unet = UNet2DConditionModel.from_pretrained(
    "gvecchio/StableMaterials",
    subfolder="unet_lcm",
    torch_dtype=torch.float16,
)

# Load pipeline enabling the execution of custom code
pipe = DiffusionPipeline.from_pretrained(
    "gvecchio/StableMaterials", 
    trust_remote_code=True, 
    unet=unet,
    torch_dtype=torch.float16
)

# Replace scheduler with LCM scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe = pipe.to(device)

# Text prompt example
material = pipe(
  prompt="Old rusty metal bars with peeling paint",
  guidance_scale=10.0,
  tileable=True,
  num_images_per_prompt=1,
  num_inference_steps=4, # LCM enables fast generation in as few as 4 steps
).images[0]

# Image prompt example
material = pipe(
  prompt=load_image("path/to/input_image.jpg"),
  guidance_scale=10.0,
  tileable=True,
  num_images_per_prompt=1,
  num_inference_steps=4,
).images[0]

# The output will include basecolor, normal, height, roughness, and metallic maps
basecolor = material.basecolor
normal = material.normal
height = material.height
roughness = material.roughness
metallic = material.metallic

📚 Documentation

🗂️ Training Data

The model is trained on a combined dataset from MatSynth and Deschaintre et al., including 6,198 unique PBR materials. It also incorporates 4,000 texture-text pairs generated from the SDXL model using various prompts. 🔍

🔧 Limitations

While StableMaterials shows robust performance, it has some limitations:

It may struggle with complex prompts describing intricate spatial relationships. 🧩
It may not accurately represent highly detailed patterns or figures. 🎨
It occasionally generates incorrect reflectance properties for certain material types. ✨

Future updates aim to address these limitations by incorporating more diverse training prompts and improving the model's handling of complex textures.

📖 Citation

If you use this model in your research, please cite the following paper:

@article{vecchio2024stablematerials,
  title={StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning},
  author={Vecchio, Giuseppe},
  journal={arXiv preprint arXiv:2406.09293},
  year={2024}
}

Model Architecture

🧩 Base Model

🔑 Key Features

Semi-Supervised Learning: The model is trained using both annotated and unannotated data, leveraging adversarial training to distill knowledge from large-scale pretrained image generation models. 📚
Knowledge Distillation: Incorporates unannotated texture samples generated using the SDXL model into the training process, bridging the gap between different data distributions. 🌐
Latent Consistency: Employs a latent consistency model to facilitate fast generation, reducing the inference steps required to produce high-quality outputs. ⚡
Feature Rolling: Introduces a novel tileability technique by rolling feature maps for each convolutional and attention layer in the U-Net architecture. 🎢

Information Table

Property	Details
Model Type	A diffusion-based model for generating photorealistic PBR materials
Training Data	A combined dataset from MatSynth and Deschaintre et al., including 6,198 unique PBR materials, and 4,000 texture-text pairs generated from the SDXL model using various prompts
License	Openrail
Library Name	diffusers
Pipeline Tag	text-to-image
Tags	material, pbr, svbrdf, 3d, texture
Inference	false

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご