ProteusV0.3-Lightning開源圖像生成模型 - 快速推理產出高質量文本生成圖像

首頁

Proteusv0.3 Lightning

由dataautogpt3開發

基於字節跳動Lightning技術優化的文本生成圖像模型，在保持高質量的同時實現快速推理

圖像生成開源協議:Gpl-3.0 #閃電級推理 #動漫增強生成 #DPO優化畫質

下載量 69

發布時間 : 2/21/2024

模型概述

ProteusV0.3是對OpenDalleV1.1的深度優化升級，具有更強的提示詞響應能力與創意表達能力，特別擅長動漫、超現實主義和卡通風格的圖像生成

模型特點

閃電版推理速度

採用字節跳動Lightning技術，在幾乎不損失生成質量的前提下實現更快推理

增強的提示詞理解

通過22萬張GPTV標註素材微調，顯著提升對複雜提示的響應能力

動態LORA加載

通過靶向訓練的多LORA模型動態調整，優化特定風格和細節表現

高質量面部生成

在呈現複雜面部特徵和逼真肌膚紋理方面有顯著提升

模型能力

文本生成圖像

動漫風格生成

超現實主義圖像生成

卡通風格生成

高分辨率圖像生成

使用案例

創意設計

動漫角色設計

生成各種風格的動漫角色形象

如示例中的'手持標牌的動漫少女'和'武士全身像'

概念藝術創作

為遊戲、電影等創作概念藝術圖

如示例中的'太空朋克像素畫'和'科幻聖女貞德'

視覺藝術

藝術風格模仿

模仿特定藝術風格如Artgerm風格、柯達膠片風格等

如示例中的'Artgerm風格海景'和'柯達膠片風格和服女子'

實驗性藝術創作

創作具有特殊視覺效果的藝術作品

如示例中的'像素剪影'和'負空間構圖'

🚀 ProteusV0.3-Lightning

ProteusV0.3-Lightning採用了字節跳動發佈的全新Lightning方法，能在保證質量和提示理解能力的前提下，實現更快的推理速度。

🚀 快速開始

模型介紹

Proteus是在OpenDalleV1.1基礎上的一次重大升級，它充分利用了OpenDalleV1.1的核心功能，進一步提升了性能。主要改進體現在對提示的響應更加靈敏，以及具備更強的創造力。為了達到這一效果，模型使用了約220,000張帶有GPTV字幕的免版權庫存圖像（其中包含一些動漫圖像）進行微調，並對這些圖像進行了歸一化處理。此外，還通過精心挑選的10,000對高質量AI生成圖像對，運用了直接偏好優化（DPO）技術。

為了實現最佳性能，在訓練過程中會獨立訓練多個低秩自適應（LORA）模型，然後通過動態應用方法有選擇地將它們集成到主模型中。這些技術能夠針對模型的特定部分進行優化，同時避免在學習過程中對其他部分產生干擾。因此，Proteus在描繪複雜的面部特徵和逼真的皮膚紋理方面有顯著提升，同時在各種美學領域，特別是超現實主義、動漫和卡通風格的可視化方面，也保持了出色的表現。

模型設置

使用以下設置可以讓ProteusV0.3-Lightning達到最佳效果：

屬性	詳情
CFG Scale	使用1到2的CFG比例
Steps	4到10步可獲得更多細節，8步可獲得更快的結果
Sampler	eular
Scheduler	normal
Resolution	1280x1280或1024x1024

此外，建議在提示詞中使用以下關鍵詞來提升效果：最佳質量、高清、~*~美學~*~。

如果在構思提示詞時遇到困難，可以使用這個我整理的GPT來幫助優化提示詞：點擊訪問

示例展示

輸入文本	輸出圖片
Anime Girl holding a sign that says 'Proteus Lighting'	ComfyUI_08512_.png
Anime full body portrait of a swordsman holding his weapon in front of him. He is facing the camera with a fierce look on his face. Anime key visual (best quality, HD, ~~aesthetic~~:1.2)	ComfyUI_08516_.png
Anime high quality pixel art, a pixel art silhouette of an anime space-themed girl in a space-punk steampunk style, lying in her bed by the window of a spaceship, smoking, with a rustic feel. The image should embody epic portraiture and double exposure, featuring an isolated landscape visible through the window. The colors should primarily be dynamic and action-packed, with a strong use of negative space. The entire artwork should be in pixel art style, emphasizing the characters shape and set against a white background. Silhouette	ComfyUI_08567_.png
Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass flowers, Stains, space grunge style, Jeanne dArc wearing White Olive green used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty, noisy, Vintage monk style, very detailed, hd	ComfyUI_08571_.png
Super cinematic film still of Kodak Motion Picture Film (Sharp Detailed Image) An Oscar winning movie for Best Cinematography a woman in a kimono standing on a subway train in Japan Kodak Motion Picture Film Style, shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy	ComfyUI_08578_.png
in the style of artgerm, comic style,3D model, mythical seascape, negative space, space quixotic dreams, temporal hallucination, psychedelic, mystical, intricate details, very bright neon colors, (vantablack background:1.5), pointillism, pareidolia, melting, symbolism, very high contrast, chiaroscuro	ComfyUI_08582_.png

💻 使用示例

基礎用法

import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    EulerAncestralDiscreteScheduler,
    AutoencoderKL
)

# Load VAE component
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "dataautogpt3/ProteusV0.3-Lightning", 
    vae=vae,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

# Define prompts and generate image
prompt = "black fluffy gorgeous dangerous cat animal creature, large orange eyes, big fluffy ears, piercing gaze, full moon, dark ambiance, best quality, extremely detailed"
negative_prompt = "nsfw, bad quality, bad anatomy, worst quality, low quality, low resolutions, extra fingers, blur, blurry, ugly, wrongs proportions, watermark, image artifacts, lowres, ugly, jpeg artifacts, deformed, noisy image"

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=1,
    num_inference_steps=4
).images[0]