🚀 CogView3-Plus-3B
CogView3-Plus-3B 是一款文本到圖像的生成模型,支持生成 512 到 2048px 的圖像,具有高效的推理速度和靈活的分辨率設置。
📄 中文閱讀 |
🤗 Hugging Face Space |
🌐 Github |
📜 arxiv
📍 訪問 清言 和 API平臺 體驗更大規模的商業視頻生成模型。
🚀 快速開始
首先,確保從源代碼安裝 diffusers
庫:
pip install git+https://github.com/huggingface/diffusers.git
然後,運行以下代碼:
from diffusers import CogView3PlusPipeline
import torch
pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
prompt=prompt,
guidance_scale=7.0,
num_images_per_prompt=1,
num_inference_steps=50,
width=1024,
height=1024,
).images[0]
image.save("cogview3.png")
更多內容和下載原始 SAT 權重,請訪問我們的 GitHub。
✨ 主要特性
- 分辨率靈活:寬度和高度必須在 512px 到 2048px 範圍內,且必須能被 32 整除。
- 推理速度快:1s / 步(在 A100 上測試)
- 精度可選:支持 BF16 / FP32(不支持 FP16,因為會導致溢出產生黑色圖像)
🔧 技術細節
推理要求和模型概述
此模型是 CogView3 的 DiT 版本,是一個文本到圖像的生成模型,支持生成 512 到 2048px 的圖像。
內存消耗
我們在 A100 設備上測試了幾種常見分辨率下的內存消耗,batchsize=1, BF16
,如下表所示:
分辨率 |
enable_model_cpu_offload OFF |
enable_model_cpu_offload ON |
512 * 512 |
19GB |
11GB |
720 * 480 |
20GB |
11GB |
1024 * 1024 |
23GB |
11GB |
1280 * 720 |
24GB |
11GB |
2048 * 2048 |
25GB |
11GB |
📄 引用
🌟 如果您覺得我們的工作有幫助,請引用我們的論文並留下一個星標:
@article{zheng2024cogview3,
title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
journal={arXiv preprint arXiv:2403.05121},
year={2024}
}
📄 許可證
本模型根據 Apache 2.0 許可證 發佈。