CogView4-6B文生圖模型開源 - 支持中英雙語，免費生成高質量圖像！

首頁

Cogview4 6B

由THUDM開發

CogView4-6B 是一個基於 GLM-4-9B 基礎模型的文生圖模型，支持中文和英文，能夠生成高質量的圖像。

文本生成圖像支持多種語言開源協議:Apache-2.0 #高分辨率文生圖 #中文精準生成 #多維度評測領先

下載量 333.85k

發布時間 : 3/3/2025

模型概述

CogView4-6B 是一個高性能的文生圖模型，能夠根據文本提示生成高質量的圖像，支持多種分辨率和複雜的文本描述。

模型特點

高分辨率支持

支持寬高在 512px 至 2048px 之間的多種分辨率，且總像素不超過 2^21 像素。

低顯存優化

通過模型 CPU 卸載和文本編碼器 4bit 量化等技術，顯著降低顯存佔用。

中文文本準確率高

在中文文本準確率評估中，F1 值達到 0.6168，顯著優於其他模型。

模型能力

文本生成圖像

高分辨率圖像生成

多語言支持（中文、英文）

使用案例

創意設計

跑車設計

根據詳細的文本描述生成高質量的跑車圖像。

生成圖像具有高度的細節和逼真度，能夠準確還原文本描述的屬性。

廣告與營銷

產品展示

根據產品描述生成吸引人的產品展示圖像。

生成的圖像能夠突出產品的關鍵特性和賣點。

🚀 CogView4-6B

CogView4-6B 是一個文本到圖像的模型，可根據輸入的文本生成相應的圖像。該模型在多個基準測試中取得了優異的成績，具有較高的圖像生成質量和準確性。

🤗 模型空間 | 🌐 Github 倉庫 | 📜 CogView3 論文

🚀 快速開始

首先，確保你從源代碼安裝 diffusers 庫。

pip install git+https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .

然後，運行以下代碼：

from diffusers import CogView4Pipeline

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)

# 開啟以減少 GPU 內存使用
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview4.png")

✨ 主要特性

推理要求和模型介紹

分辨率：寬度和高度必須在 512px 到 2048px 之間，且能被 32 整除，並確保最大像素數不超過 2^21 像素。
精度：支持 BF16 / FP32（不支持 FP16，因為會導致溢出，生成全黑圖像）

使用 BF16 精度和 batchsize=4 進行測試，內存使用情況如下表所示：

分辨率	enable_model_cpu_offload 關閉	enable_model_cpu_offload 開啟	enable_model_cpu_offload 開啟文本編碼器 4bit
512 * 512	33GB	20GB	13G
1280 * 720	35GB	20GB	13G
1024 * 1024	35GB	20GB	13G
1920 * 1280	39GB	20GB	14G

模型指標

我們在多個基準測試中進行了測試，並取得了以下分數：

DPG-Bench

模型	總體得分	全局得分	實體得分	屬性得分	關係得分	其他得分
SDXL	74.65	83.27	82.43	80.91	86.76	80.41
PixArt-alpha	71.11	74.97	79.32	78.60	82.57	76.96
SD3-Medium	84.08	87.90	91.01	88.83	80.70	88.68
DALL-E 3	83.50	90.97	89.61	88.39	90.58	89.83
Flux.1-dev	83.79	85.80	86.79	89.98	90.04	89.90
Janus-Pro-7B	84.19	86.90	88.90	89.40	89.32	89.48
CogView4-6B	85.13	83.85	90.35	91.17	91.14	87.29

GenEval

模型	總體得分	單對象得分	雙對象得分	計數得分	顏色得分	位置得分	顏色屬性得分
SDXL	0.55	0.98	0.74	0.39	0.85	0.15	0.23
PixArt-alpha	0.48	0.98	0.50	0.44	0.80	0.08	0.07
SD3-Medium	0.74	0.99	0.94	0.72	0.89	0.33	0.60
DALL-E 3	0.67	0.96	0.87	0.47	0.83	0.43	0.45
Flux.1-dev	0.66	0.98	0.79	0.73	0.77	0.22	0.45
Janus-Pro-7B	0.80	0.99	0.89	0.59	0.90	0.79	0.66
CogView4-6B	0.73	0.99	0.86	0.66	0.79	0.48	0.58

T2I-CompBench

模型	顏色得分	形狀得分	紋理得分	2D 空間得分	3D 空間得分	數值得分	非空間剪輯得分	複雜三合一得分
SDXL	0.5879	0.4687	0.5299	0.2133	0.3566	0.4988	0.3119	0.3237
PixArt-alpha	0.6690	0.4927	0.6477	0.2064	0.3901	0.5058	0.3197	0.3433
SD3-Medium	0.8132	0.5885	0.7334	0.3200	0.4084	0.6174	0.3140	0.3771
DALL-E 3	0.7785	0.6205	0.7036	0.2865	0.3744	0.5880	0.3003	0.3773
Flux.1-dev	0.7572	0.5066	0.6300	0.2700	0.3992	0.6165	0.3065	0.3628
Janus-Pro-7B	0.5145	0.3323	0.4069	0.1566	0.2753	0.4406	0.3137	0.3806
CogView4-6B	0.7786	0.5880	0.6983	0.3075	0.3708	0.6626	0.3056	0.3869

中文文本準確性評估

模型	精確率	召回率	F1 分數	Pick@4 得分
Kolors	0.6094	0.1886	0.2880	0.1633
CogView4-6B	0.6969	0.5532	0.6168	0.3265

📄 許可證

該模型根據 Apache 2.0 許可證發佈。

📚 引用

🌟 如果你覺得我們的工作有幫助，請考慮引用我們的論文並留下寶貴的 Star。

@article{zheng2024cogview3,
  title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
  author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
  journal={arXiv preprint arXiv:2403.05121},
  year={2024}
}