CogView4-6B-Edit-LoRA-v0 Open-Source Image Editing Model - Style Transformation and Content Editing by Text Instructions

Cogview4 6B Edit LoRA V0

Developed by finetrainers

An image editing fine-tuned control LoRA based on THUDM/CogView4-6B model, supporting style transfer and content editing through text instructions

Text-to-Image #Image Style Editing #LoRA Fine-tuning #Multi-condition Control

Downloads 20

Release Time : 4/6/2025

Model Overview

This is an experimental image editing fine-tuned model adapted to CogView4-6B using LoRA technology, enabling style and content modifications based on text instructions.

Model Features

Text-guided Image Editing

Achieve style transfer and content editing of images through natural language instructions

LoRA Fine-tuning Technology

Utilizes Low-Rank Adaptation (LoRA) technology for efficient fine-tuning of large base models

Multi-style Support

Supports various style transfers such as impasto painting, seasonal changes, and space scenes

Model Capabilities

Text-to-Image Generation

Image Style Transfer

Image Content Editing

Text-prompted Image Modification

Use Cases

Creative Design

Art Style Transfer

Convert ordinary images into specific art styles (e.g., impasto painting, ancient Egyptian murals)

output1.png

Scene Transformation

Alter seasons or environments in images (e.g., spring trees in full bloom, stormy space scenes)

output2.png, output3.png

Content Creation

Concept Design

Quickly generate concept art images in different styles

🚀 Control LoRA for Image Editing with THUDM/CogView4-6B

This project provides a Control LoRA that allows for minor edits to images using the THUDM/CogView4-6B model. It offers a practical solution for making specific style and setting changes to images.

✨ Features

Base Model: Utilizes the THUDM/CogView4-6B model as the foundation.
Datasets: Trained on the sayapaul/OmniEdit-mini dataset.
Library: Built with the diffusers library.
Multiple Editing Examples: Demonstrates various editing scenarios, such as changing to impasto painting style, spring settings, and stormy space settings.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

# For now, must use this branch of finetrainers: https://github.com/a-r-r-o-w/finetrainers/blob/f3e27cc39a2bc804cb373ea15522576e57f46d23/finetrainers/models/cogview4/control_specification.py

import torch
from diffusers import CogView4Pipeline
from diffusers.utils import load_image
from finetrainers.models.utils import _expand_linear_with_zeroed_weights
from finetrainers.patches import load_lora_weights
from finetrainers.patches.dependencies.diffusers.control import control_channel_concat

dtype = torch.bfloat16
device = torch.device("cuda")
generator = torch.Generator().manual_seed(0)

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=dtype)

in_channels = pipe.transformer.config.in_channels
patch_channels = pipe.transformer.patch_embed.proj.in_features
pipe.transformer.patch_embed.proj = _expand_linear_with_zeroed_weights(pipe.transformer.patch_embed.proj, new_in_features=2 * patch_channels)

load_lora_weights(pipe, "finetrainers/CogView4-6B-Edit-LoRA-v0", "cogview4-lora")
pipe.set_adapters("cogview4-lora", 0.9)
pipe.to(device)

prompt = "Make the image look like it's from an ancient Egyptian mural."
control_image = load_image("examples/training/control/cogview4/omni_edit/validation_dataset/0.png")
height, width = 1024, 1024

with torch.no_grad():
    latents = pipe.prepare_latents(1, in_channels, height, width, dtype, device, generator)
    control_image = pipe.image_processor.preprocess(control_image, height=height, width=width)
    control_image = control_image.to(device=device, dtype=dtype)
    control_latents = pipe.vae.encode(control_image).latent_dist.sample(generator=generator)
    control_latents = (control_latents - pipe.vae.config.shift_factor) * pipe.vae.config.scaling_factor

with control_channel_concat(pipe.transformer, ["hidden_states"], [control_latents], dims=[1]):
    image = pipe(prompt, latents=latents, num_inference_steps=30, generator=generator).images[0]

image.save("output.png")

📚 Documentation

Code Repository: https://github.com/a-r-r-o-w/finetrainers

🔧 Technical Details

No detailed technical implementation details are provided in the original document.

📄 License

No license information is provided in the original document.

⚠️ Important Note

This is an experimental checkpoint and its poor generalization is well-known.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご