ProteusV0.3-Lightningオープンソース画像生成モデル - 高速推論で高品質のテキストから画像を生成

ホーム

Proteusv0.3 Lightning

dataautogpt3によって開発

バイトダンスのLightning技術で最適化されたテキスト生成画像モデル。高品質を維持しながら高速推論を実現

画像生成オープンソースライセンス:Gpl-3.0 #ライトニング級推論 #アニメ強化生成 #DPO最適化画質

ダウンロード数 69

リリース時間 : 2/21/2024

モデル概要

ProteusV0.3はOpenDalleV1.1を深く最適化したバージョンで、プロンプトへの応答能力と創造的表現力が強化されており、特にアニメ、シュールレアリスム、カートゥーン風の画像生成に優れている

モデル特徴

ライトニング版推論速度

バイトダンスLightning技術を採用し、生成品質をほぼ損なわずに高速推論を実現

強化されたプロンプト理解

22万枚のGPTV注釈データで微調整し、複雑なプロンプトへの応答能力を大幅向上

動的LORAロード

ターゲット訓練された複数LORAモデルで動的調整し、特定スタイルと細部表現を最適化

高品質な顔生成

複雑な顔の特徴とリアルな肌の質感表現が大幅に改善

モデル能力

テキスト生成画像

アニメスタイル生成

シュールレアリスム画像生成

カートゥーンスタイル生成

高解像度画像生成

使用事例

クリエイティブデザイン

アニメキャラクターデザイン

様々なスタイルのアニメキャラクターを生成

例示の'看板を持つアニメ少女'や'サムライ全身像'

コンセプトアート創作

ゲームや映画などのコンセプトアート作成

例示の'スペースパンクピクセルアート'や'SFジャンヌ・ダルク'

ビジュアルアート

芸術スタイル模倣

Artgermスタイルやコダックフィルムスタイルなど特定の芸術スタイルを模倣

例示の'Artgermスタイル海景'や'コダックフィルムスタイルの着物女性'

実験的芸術創作

特殊な視覚効果を持つ芸術作品を創作

例示の'ピクセルシルエット'や'ネガティブスペース構図'

🚀 ProteusV0.3-Lightning

このモデルは、バイトダンスがリリースした新しいLightning手法を用いており、品質とプロンプト理解度の損失を最小限に抑えながら、高速な推論を実現します。

✨ 主な機能

Proteusの概要

Proteusは、OpenDalleV1.1を高度に強化したモデルで、そのコア機能を活用して、より優れた結果を提供します。主な進歩点は、プロンプトに対する応答性の向上と創造力の拡張です。これを達成するために、著作権フリーのストック画像（一部アニメを含む）から約220,000枚のGPTVキャプション付き画像を使用して微調整し、それらを正規化しました。さらに、10,000枚の精心選択された高品質のAI生成画像ペアを用いてDPO（Direct Preference Optimization）を適用しました。

最適なパフォーマンスを追求するために、多数のLORA（Low-Rank Adaptation）モデルを独立して学習させた後、動的な適用手法によって主要モデルに選択的に組み込みます。これらの手法は、学習段階で他の領域に干渉することなく、モデル内の特定のセグメントをターゲットにします。その結果、Proteusは複雑な顔の特徴やリアルな肌の質感を描写する能力が著しく向上し、同時に、特に超現実主義、アニメ、カートゥーンスタイルのビジュアライゼーションなど、様々な美学領域で優れた性能を維持します。

📦 インストール

このセクションでは、インストールに関する具体的な手順が提供されていません。

💻 使用例

基本的な使用法

import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    EulerAncestralDiscreteScheduler,
    AutoencoderKL
)

# Load VAE component
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "dataautogpt3/ProteusV0.3-Lightning", 
    vae=vae,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

# Define prompts and generate image
prompt = "black fluffy gorgeous dangerous cat animal creature, large orange eyes, big fluffy ears, piercing gaze, full moon, dark ambiance, best quality, extremely detailed"
negative_prompt = "nsfw, bad quality, bad anatomy, worst quality, low quality, low resolutions, extra fingers, blur, blurry, ugly, wrongs proportions, watermark, image artifacts, lowres, ugly, jpeg artifacts, deformed, noisy image"

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=1,
    num_inference_steps=4
).images[0]

📚 ドキュメント

ProteusV0.3-Lightningの設定

ProteusV0.3-Lightningで最良の結果を得るためには、以下の設定を使用してください。

属性	詳情
CFG Scale	1から2のCFGスケールを使用します。
Steps	より詳細な結果を得るには4から10ステップ、高速な結果を得るには8ステップを使用します。
Sampler	eular
Scheduler	normal
Resolution	1280x1280または1024x1024

また、プロンプトを改善するために、以下のキーワードを使用することを検討してください。 best quality, HD, ~*~aesthetic~*~

もしプロンプトを考えるのに困っている場合は、私がまとめたこのGPTを使用してプロンプトを洗練させることができます。 https://chat.openai.com/g/g-RziQNoydR-diffusion-master

サポート

私の仕事をサポートしていただける場合は、以下のリンクから寄付をしていただくか、フォローをしていただけると幸いです。

ウィジェットの使用例

以下は、このモデルを使用したウィジェットの使用例です。

入力: Anime Girl holding a sign that says 'Proteus Lighting' 出力: ComfyUI_08512_.png
入力: Anime full body portrait of a swordsman holding his weapon in front of him. He is facing the camera with a fierce look on his face. Anime key visual (best quality, HD, ~~aesthetic~~:1.2) 出力: ComfyUI_08516_.png
入力: Anime high quality pixel art, a pixel art silhouette of an anime space-themed girl in a space-punk steampunk style, lying in her bed by the window of a spaceship, smoking, with a rustic feel. The image should embody epic portraiture and double exposure, featuring an isolated landscape visible through the window. The colors should primarily be dynamic and action-packed, with a strong use of negative space. The entire artwork should be in pixel art style, emphasizing the characters shape and set against a white background. Silhouette 出力: ComfyUI_08567_.png
入力: Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass flowers, Stains, space grunge style, Jeanne dArc wearing White Olive green used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty, noisy, Vintage monk style, very detailed, hd 出力: ComfyUI_08571_.png
入力: Super cinematic film still of Kodak Motion Picture Film (Sharp Detailed Image) An Oscar winning movie for Best Cinematography a woman in a kimono standing on a subway train in Japan Kodak Motion Picture Film Style, shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy 出力: ComfyUI_08578_.png
入力: in the style of artgerm, comic style,3D model, mythical seascape, negative space, space quixotic dreams, temporal hallucination, psychedelic, mystical, intricate details, very bright neon colors, (vantablack background:1.5), pointillism, pareidolia, melting, symbolism, very high contrast, chiaroscuro 出力: ComfyUI_08582_.png