ProteusV0.3-Lightning开源图像生成模型 - 快速推理产出高质量文本生成图像

首页

Proteusv0.3 Lightning

由 dataautogpt3 开发

基于字节跳动Lightning技术优化的文本生成图像模型，在保持高质量的同时实现快速推理

图像生成开源协议:Gpl-3.0 #闪电级推理 #动漫增强生成 #DPO优化画质

下载量 69

发布时间 : 2/21/2024

模型简介

ProteusV0.3是对OpenDalleV1.1的深度优化升级，具有更强的提示词响应能力与创意表达能力，特别擅长动漫、超现实主义和卡通风格的图像生成

模型特点

闪电版推理速度

采用字节跳动Lightning技术，在几乎不损失生成质量的前提下实现更快推理

增强的提示词理解

通过22万张GPTV标注素材微调，显著提升对复杂提示的响应能力

动态LORA加载

通过靶向训练的多LORA模型动态调整，优化特定风格和细节表现

高质量面部生成

在呈现复杂面部特征和逼真肌肤纹理方面有显著提升

模型能力

文本生成图像

动漫风格生成

超现实主义图像生成

卡通风格生成

高分辨率图像生成

使用案例

创意设计

动漫角色设计

生成各种风格的动漫角色形象

如示例中的'手持标牌的动漫少女'和'武士全身像'

概念艺术创作

为游戏、电影等创作概念艺术图

如示例中的'太空朋克像素画'和'科幻圣女贞德'

视觉艺术

艺术风格模仿

模仿特定艺术风格如Artgerm风格、柯达胶片风格等

如示例中的'Artgerm风格海景'和'柯达胶片风格和服女子'

实验性艺术创作

创作具有特殊视觉效果的艺术作品

如示例中的'像素剪影'和'负空间构图'

🚀 ProteusV0.3-Lightning

ProteusV0.3-Lightning采用了字节跳动发布的全新Lightning方法，能在保证质量和提示理解能力的前提下，实现更快的推理速度。

🚀 快速开始

模型介绍

Proteus是在OpenDalleV1.1基础上的一次重大升级，它充分利用了OpenDalleV1.1的核心功能，进一步提升了性能。主要改进体现在对提示的响应更加灵敏，以及具备更强的创造力。为了达到这一效果，模型使用了约220,000张带有GPTV字幕的免版权库存图像（其中包含一些动漫图像）进行微调，并对这些图像进行了归一化处理。此外，还通过精心挑选的10,000对高质量AI生成图像对，运用了直接偏好优化（DPO）技术。

为了实现最佳性能，在训练过程中会独立训练多个低秩自适应（LORA）模型，然后通过动态应用方法有选择地将它们集成到主模型中。这些技术能够针对模型的特定部分进行优化，同时避免在学习过程中对其他部分产生干扰。因此，Proteus在描绘复杂的面部特征和逼真的皮肤纹理方面有显著提升，同时在各种美学领域，特别是超现实主义、动漫和卡通风格的可视化方面，也保持了出色的表现。

模型设置

使用以下设置可以让ProteusV0.3-Lightning达到最佳效果：

属性	详情
CFG Scale	使用1到2的CFG比例
Steps	4到10步可获得更多细节，8步可获得更快的结果
Sampler	eular
Scheduler	normal
Resolution	1280x1280或1024x1024

此外，建议在提示词中使用以下关键词来提升效果：最佳质量、高清、~*~美学~*~。

如果在构思提示词时遇到困难，可以使用这个我整理的GPT来帮助优化提示词：点击访问

示例展示

输入文本	输出图片
Anime Girl holding a sign that says 'Proteus Lighting'	ComfyUI_08512_.png
Anime full body portrait of a swordsman holding his weapon in front of him. He is facing the camera with a fierce look on his face. Anime key visual (best quality, HD, ~~aesthetic~~:1.2)	ComfyUI_08516_.png
Anime high quality pixel art, a pixel art silhouette of an anime space-themed girl in a space-punk steampunk style, lying in her bed by the window of a spaceship, smoking, with a rustic feel. The image should embody epic portraiture and double exposure, featuring an isolated landscape visible through the window. The colors should primarily be dynamic and action-packed, with a strong use of negative space. The entire artwork should be in pixel art style, emphasizing the characters shape and set against a white background. Silhouette	ComfyUI_08567_.png
Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass flowers, Stains, space grunge style, Jeanne dArc wearing White Olive green used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty, noisy, Vintage monk style, very detailed, hd	ComfyUI_08571_.png
Super cinematic film still of Kodak Motion Picture Film (Sharp Detailed Image) An Oscar winning movie for Best Cinematography a woman in a kimono standing on a subway train in Japan Kodak Motion Picture Film Style, shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy	ComfyUI_08578_.png
in the style of artgerm, comic style,3D model, mythical seascape, negative space, space quixotic dreams, temporal hallucination, psychedelic, mystical, intricate details, very bright neon colors, (vantablack background:1.5), pointillism, pareidolia, melting, symbolism, very high contrast, chiaroscuro	ComfyUI_08582_.png

💻 使用示例

基础用法

import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    EulerAncestralDiscreteScheduler,
    AutoencoderKL
)

# Load VAE component
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "dataautogpt3/ProteusV0.3-Lightning", 
    vae=vae,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

# Define prompts and generate image
prompt = "black fluffy gorgeous dangerous cat animal creature, large orange eyes, big fluffy ears, piercing gaze, full moon, dark ambiance, best quality, extremely detailed"
negative_prompt = "nsfw, bad quality, bad anatomy, worst quality, low quality, low resolutions, extra fingers, blur, blurry, ugly, wrongs proportions, watermark, image artifacts, lowres, ugly, jpeg artifacts, deformed, noisy image"

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=1,
    num_inference_steps=4
).images[0]