CogView4-6B-Edit-LoRA-v0オープンソース画像編集モデル - テキスト命令によるスタイル変換とコンテンツ編集

ホーム

Cogview4 6B Edit LoRA V0

finetrainersによって開発

THUDM/CogView4-6Bモデルをベースにした画像編集ファインチューニング制御LoRAで、テキスト指示による画像のスタイル変換や内容編集をサポート

テキスト生成画像 #画像スタイル編集 #LoRAファインチューニング #多条件制御

ダウンロード数 20

リリース時間 : 4/6/2025

モデル概要

これは実験的な画像編集ファインチューニングモデルで、LoRA技術を用いてCogView4-6Bを適応させ、テキスト指示に基づいて画像スタイルや内容を変更できます。

モデル特徴

テキストガイドによる画像編集

自然言語指示による画像のスタイル変換や内容編集を実現

LoRAファインチューニング技術

低ランク適応(LoRA)技術を用いて大規模ベースモデルを効率的にファインチューニング

多スタイルサポート

厚塗り絵画、季節変遷、宇宙シーンなど多様なスタイル変換をサポート

モデル能力

テキストから画像生成

画像スタイル変換

画像内容編集

テキストプロンプトに基づく画像修正

使用事例

クリエイティブデザイン

アートスタイル変換

通常の画像を特定のアートスタイル(厚塗り絵画、古代エジプト壁画など)に変換

output1.png

シーン変換

画像中の季節や環境を変更(春の花咲く木々、嵐の宇宙など)

output2.png, output3.png

コンテンツ創作

コンセプトデザイン

異なるスタイルのコンセプトアート画像を迅速に生成

🚀 CogView4-6B コントロールLoRA

このプロジェクトは、THUDM/CogView4-6B モデルを用いて画像を細かく編集するためのコントロールLoRAです。画像のスタイル変更や設定変更など、様々な編集が可能です。

🚀 クイックスタート

このコントロールLoRAを使用するには、以下の手順に従ってください。

コードの入手

ソースコードはこちらのリポジトリにあります。 https://github.com/a-r-r-o-w/finetrainers

推論コード

以下のコードを使用して、画像の編集を行うことができます。

# For now, must use this branch of finetrainers: https://github.com/a-r-r-o-w/finetrainers/blob/f3e27cc39a2bc804cb373ea15522576e57f46d23/finetrainers/models/cogview4/control_specification.py

import torch
from diffusers import CogView4Pipeline
from diffusers.utils import load_image
from finetrainers.models.utils import _expand_linear_with_zeroed_weights
from finetrainers.patches import load_lora_weights
from finetrainers.patches.dependencies.diffusers.control import control_channel_concat

dtype = torch.bfloat16
device = torch.device("cuda")
generator = torch.Generator().manual_seed(0)

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=dtype)

in_channels = pipe.transformer.config.in_channels
patch_channels = pipe.transformer.patch_embed.proj.in_features
pipe.transformer.patch_embed.proj = _expand_linear_with_zeroed_weights(pipe.transformer.patch_embed.proj, new_in_features=2 * patch_channels)

load_lora_weights(pipe, "finetrainers/CogView4-6B-Edit-LoRA-v0", "cogview4-lora")
pipe.set_adapters("cogview4-lora", 0.9)
pipe.to(device)

prompt = "Make the image look like it's from an ancient Egyptian mural."
control_image = load_image("examples/training/control/cogview4/omni_edit/validation_dataset/0.png")
height, width = 1024, 1024

with torch.no_grad():
    latents = pipe.prepare_latents(1, in_channels, height, width, dtype, device, generator)
    control_image = pipe.image_processor.preprocess(control_image, height=height, width=width)
    control_image = control_image.to(device=device, dtype=dtype)
    control_latents = pipe.vae.encode(control_image).latent_dist.sample(generator=generator)
    control_latents = (control_latents - pipe.vae.config.shift_factor) * pipe.vae.config.scaling_factor

with control_channel_concat(pipe.transformer, ["hidden_states"], [control_latents], dims=[1]):
    image = pipe(prompt, latents=latents, num_inference_steps=30, generator=generator).images[0]

image.save("output.png")

📚 詳細ドキュメント

モデル情報

属性	詳情
モデルタイプ	コントロールLoRA
ベースモデル	THUDM/CogView4-6B
データセット	sayapaul/OmniEdit-mini
ライブラリ名	diffusers

使用例

以下は、このコントロールLoRAを使用した画像編集の例です。

入力テキスト	出力画像
Change it to look like it's in the style of an impasto painting.	output1.png
change the setting to spring with blooming trees	output2.png
transform the setting to a stormy space	output3.png