SegMoE-4x2-v0 オープンソース画像生成モデル - 専門家SDXLを組み合わせ、画像生成能力がさらに向上

Segmoe 4x2 V0

segmindによって開発

SegMoE-4x2-v0はトレーニング不要のSegmind拡散専門家混合モデルで、4つの専門家級SDXLモデルを動的に組み合わせて生成され、より広範な知識ベースと強力な画像生成能力を備えています。

画像生成オープンソースライセンス:Apache-2.0 #専門家混合拡散 #トレーニング不要の組み合わせ #超写実生成

ダウンロード数 1,389

リリース時間 : 1/29/2024

モデル概要

SegMoEは強力なフレームワークで、数分間で複数の安定拡散モデルを動的に組み合わせて専門家混合体を形成し、トレーニング不要です。このフレームワークは、より広範な知識ベース、より強力な指示追従性、より優れた画像品質を備えたより大規模なモデルを即座に作成することをサポートします。

モデル特徴

動的専門家モデル組み合わせ

トレーニング不要で複数の専門家級SDXLモデルを動的に組み合わせ、より強力なモデルを形成

広範な知識ベース

複数の専門家モデルの知識を統合し、より広範な理解と生成能力を備える

高品質画像生成

専門家混合により画像品質とプロンプト追従性を向上

トレーニング不要

モデル組み合わせプロセスに追加のトレーニングステップは不要

モデル能力

テキストから画像生成

超写実画像生成

多スタイル画像生成

使用事例

クリエイティブデザイン

コンセプトアート作成

ゲーム、映画などのためのコンセプトアート画像を作成

高品質で多様なコンセプトアート作品

広告デザイン

広告に必要なビジュアル素材を生成

プロフェッショナルレベルの広告画像

コンテンツ作成

ソーシャルメディアコンテンツ

ソーシャルメディアプラットフォーム向けの魅力的なビジュアルコンテンツを生成

多様なスタイルのソーシャルメディア画像

イラスト作成

書籍、雑誌などのためのイラストを作成

芸術的なスタイルが豊富なイラスト作品

🚀 SegMoE-4x2-v0: Segmind Mixture of Diffusion Experts

SegMoE-4x2-v0は、4つのExpert SDXLモデルからsegmoeを使用して生成された、学習されていないSegmind Mixture of Diffusion Expertsモデルです。SegMoEは、Stable Diffusionモデルを数分で学習することなくMixture of Expertsに動的に結合するための強力なフレームワークです。このフレームワークにより、より大きな知識、より良い忠実度、およびより良い画像品質を提供する大規模なモデルを即座に作成することができます。

🚀 クイックスタート

このモデルは、segmoeライブラリを介して使用できます。

segmoeをインストールするには、以下のコマンドを実行してください。

pip install segmoe

from segmoe import SegMoEPipeline

pipeline = SegMoEPipeline("segmind/SegMoE-4x2-v0", device = "cuda")

prompt = "cosmic canvas, orange city background, painting of a chubby cat"
negative_prompt = "nsfw, bad quality, worse quality"
img = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    num_inference_steps=25,
    guidance_scale=7.5,
).images[0]
img.save("image.png")

image/png

✨ 主な機能

複数の微調整されたエキスパートモデルの知識を活用
学習不要
データへの適応性が高い
エキスパートの1つとしてより良い微調整モデルを使用することで、モデルをアップグレードできる

📦 インストール

このモデルを使用するには、segmoeライブラリをインストールする必要があります。以下のコマンドを実行してください。

pip install segmoe

💻 使用例

基本的な使用法

from segmoe import SegMoEPipeline

pipeline = SegMoEPipeline("segmind/SegMoE-4x2-v0", device = "cuda")

prompt = "cosmic canvas, orange city background, painting of a chubby cat"
negative_prompt = "nsfw, bad quality, worse quality"
img = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    num_inference_steps=25,
    guidance_scale=7.5,
).images[0]
img.save("image.png")

設定

このモデルを作成するために使用された設定は以下の通りです。

base_model: SG161222/RealVisXL_V3.0
num_experts: 4
moe_layers: all
num_experts_per_tok: 2
experts:
  - source_model: frankjoshua/juggernautXL_v8Rundiffusion
    positive_prompt: "aesthetic, cinematic, hands, portrait, photo, illustration, 8K, hyperdetailed, origami, man, woman, supercar"
    negative_prompt: "(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur:1.3), (3D ,3D Game, 3D Game Scene, 3D Character:1.1), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)"
  - source_model: SG161222/RealVisXL_V3.0
    positive_prompt: "cinematic, portrait, photograph, instagram, fashion, movie, macro shot, 8K, RAW, hyperrealistic, ultra realistic,"
    negative_prompt: "(octane render, render, drawing, anime, bad photo, bad photography:1.3), (worst quality, low quality, blurry:1.2), (bad teeth, deformed teeth, deformed lips), (bad anatomy, bad proportions:1.1), (deformed iris, deformed pupils), (deformed eyes, bad eyes), (deformed face, ugly face, bad face), (deformed hands, bad hands, fused fingers), morbid, mutilated, mutation, disfigured"
  - source_model: albertushka/albertushka_DynaVisionXL
    positive_prompt: "minimalist, illustration, award winning art, painting, impressionist, comic, colors, sketch, pencil drawing,"
    negative_prompt: "Compression artifacts, bad art, worst quality, low quality, plastic, fake, bad limbs, conjoined, featureless, bad features, incorrect objects, watermark, ((signature):1.25), logo"
  - source_model: frankjoshua/albedobaseXL_v13
    positive_prompt: "photograph f/1.4, ISO 200, 1/160s, 8K, RAW, unedited, symmetrical balance, in-frame, 8K"
    negative_prompt: "nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, blurry"

その他のバリアント

Hugging Face上で3つのマージモデルをリリースしています。

SegMoE 2x1 は2つのエキスパートモデルを持っています。
SegMoE SD 4x2 は4つのStable Diffusion 1.5エキスパートモデルを持っています。

📚 ドキュメント

比較

以下の画像に示すように、プロンプトの理解度が向上しています。左から右に、SegMoE-2x1-v0、SegMoE-4x2-v0、ベースモデル(RealVisXL_V3.0)です。

three green glass bottles

panda bear with aviator glasses on its head

the statue of Liberty next to the Washington Monument

モデルの説明

属性	详情
開発者	Segmind
開発者名	Yatharth Gupta と Vishnu Jaddipal
モデルタイプ	拡散ベースのテキストから画像への生成型Mixture of Expertsモデル
ライセンス	Apache 2.0

適用範囲外の使用

SegMoE-4x2-v0モデルは、人物、イベント、または現実世界の情報の事実的または正確な表現を作成するのに適していません。高精度と正確性を必要とするタスクには意図されていません。

モデルの制限

モデルは画像の忠実度と忠実度を向上させますが、学習なしではいずれかのエキスパートよりも大幅に優れているわけではなく、エキスパートの知識に依存しています。
速度の最適化はまだ行われていません。
フレームワークのメモリ使用量の最適化はまだ行われていません。

📄 ライセンス

このモデルはApache 2.0ライセンスの下で提供されています。

引用

@misc{segmoe,
  author = {Yatharth Gupta, Vishnu V Jaddipal, Harish Prabhala},
  title = {SegMoE},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Models},
  howpublished = {\url{https://huggingface.co/segmind/SegMoE-4x2-v0}}
}