CogView4-6B-Edit-LoRA-v0開源圖像編輯模型 - 按文本指令風格轉換與內容編輯

首頁

Cogview4 6B Edit LoRA V0

由finetrainers開發

基於THUDM/CogView4-6B模型的圖像編輯微調控制LoRA，支持通過文本指令對圖像進行風格轉換和內容編輯

文本生成圖像 #圖像風格編輯 #LoRA微調 #多條件控制

下載量 20

發布時間 : 4/6/2025

模型概述

這是一個實驗性的圖像編輯微調模型，通過LoRA技術對CogView4-6B進行適配，使其能夠根據文本指令修改圖像風格和內容。

模型特點

文本引導的圖像編輯

通過自然語言指令實現對圖像的風格轉換和內容編輯

LoRA微調技術

採用低秩適應(LoRA)技術對大型基礎模型進行高效微調

多風格支持

支持厚塗繪畫、季節變換、太空場景等多種風格轉換

模型能力

文本到圖像生成

圖像風格轉換

圖像內容編輯

基於文本提示的圖像修改

使用案例

創意設計

藝術風格轉換

將普通圖像轉換為特定藝術風格(如厚塗繪畫、古埃及壁畫等)

output1.png

場景變換

改變圖像中的季節或環境(如春季開滿花的樹木、暴風雨太空等)

output2.png, output3.png

內容創作

概念設計

快速生成不同風格的概念藝術圖像

🚀 圖像微調控制LoRA項目

本項目是一個控制LoRA（Low-Rank Adaptation），用於藉助 THUDM/CogView4-6B 模型對圖像進行細微編輯。它能夠根據不同的文本提示，對圖像進行風格轉換等操作，為圖像編輯提供了新的可能性。

🚀 快速開始

基礎信息

屬性	詳情
基礎模型	THUDM/CogView4-6B
訓練數據集	sayapaul/OmniEdit-mini
庫名稱	diffusers

項目鏈接

代碼倉庫：https://github.com/a-r-r-o-w/finetrainers

重要提示

⚠️ 重要提示

這是一個實驗性的檢查點，其泛化能力較差是已知的情況。

推理代碼

💻 使用示例

基礎用法

# For now, must use this branch of finetrainers: https://github.com/a-r-r-o-w/finetrainers/blob/f3e27cc39a2bc804cb373ea15522576e57f46d23/finetrainers/models/cogview4/control_specification.py

import torch
from diffusers import CogView4Pipeline
from diffusers.utils import load_image
from finetrainers.models.utils import _expand_linear_with_zeroed_weights
from finetrainers.patches import load_lora_weights
from finetrainers.patches.dependencies.diffusers.control import control_channel_concat

dtype = torch.bfloat16
device = torch.device("cuda")
generator = torch.Generator().manual_seed(0)

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=dtype)

in_channels = pipe.transformer.config.in_channels
patch_channels = pipe.transformer.patch_embed.proj.in_features
pipe.transformer.patch_embed.proj = _expand_linear_with_zeroed_weights(pipe.transformer.patch_embed.proj, new_in_features=2 * patch_channels)

load_lora_weights(pipe, "finetrainers/CogView4-6B-Edit-LoRA-v0", "cogview4-lora")
pipe.set_adapters("cogview4-lora", 0.9)
pipe.to(device)

prompt = "Make the image look like it's from an ancient Egyptian mural."
control_image = load_image("examples/training/control/cogview4/omni_edit/validation_dataset/0.png")
height, width = 1024, 1024

with torch.no_grad():
    latents = pipe.prepare_latents(1, in_channels, height, width, dtype, device, generator)
    control_image = pipe.image_processor.preprocess(control_image, height=height, width=width)
    control_image = control_image.to(device=device, dtype=dtype)
    control_latents = pipe.vae.encode(control_image).latent_dist.sample(generator=generator)
    control_latents = (control_latents - pipe.vae.config.shift_factor) * pipe.vae.config.scaling_factor

with control_channel_concat(pipe.transformer, ["hidden_states"], [control_latents], dims=[1]):
    image = pipe(prompt, latents=latents, num_inference_steps=30, generator=generator).images[0]

image.save("output.png")

示例效果

以下是一些示例文本提示及其對應的輸出圖像：

提示文本：Change it to look like it's in the style of an impasto painting.
- 輸出圖像：output1.png
提示文本：change the setting to spring with blooming trees
- 輸出圖像：output2.png
提示文本：transform the setting to a stormy space
- 輸出圖像：output3.png