CogView4-6B Text-to-Image Model Open-Source - Supports Both Chinese and English, Generate High-Quality Images for Free!

Cogview4 6B

Developed by THUDM

CogView4-6B is a text-to-image model based on the GLM-4-9B foundation model, supporting both Chinese and English, capable of generating high-quality images.

Text-to-Image Supports Multiple LanguagesOpen Source License:Apache-2.0 #High-resolution text-to-image #Accurate Chinese generation #Leading in multi-dimensional evaluation

Downloads 333.85k

Release Time : 3/3/2025

Model Overview

CogView4-6B is a high-performance text-to-image model that can generate high-quality images based on text prompts, supporting various resolutions and complex text descriptions.

Model Features

High-resolution support

Supports multiple resolutions with width and height between 512px and 2048px, with total pixels not exceeding 2^21.

Low VRAM optimization

Significantly reduces VRAM usage through techniques like model CPU offloading and 4bit quantization of the text encoder.

High accuracy for Chinese text

Achieves an F1 score of 0.6168 in Chinese text accuracy evaluation, significantly outperforming other models.

Model Capabilities

Text-to-image generation

High-resolution image generation

Multilingual support (Chinese, English)

Use Cases

Creative design

Sports car design

Generate high-quality sports car images based on detailed text descriptions.

The generated images feature high levels of detail and realism, accurately reflecting the attributes described in the text.

Advertising and marketing

Product showcase

Generate attractive product display images based on product descriptions.

The generated images highlight the key features and selling points of the product.

🚀 CogView4-6B

CogView4-6B is a text-to-image model. It can generate high - quality images based on text descriptions, with certain advantages in multiple benchmarks.

🤗 Space | 🌐 Github | 📜 CogView3 Paper

🚀 Quick Start

First, ensure you install the diffusers library from source.

pip install git+https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .

Then, run the following code:

from diffusers import CogView4Pipeline

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)

# Open it for reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview4.png")

✨ Features

Inference Requirements and Model Introduction

Resolution: Width and height must be between 512px and 2048px, divisible by 32, and ensure the maximum number of pixels does not exceed 2^21 px.
Precision: BF16 / FP32 (FP16 is not supported as it will cause overflow resulting in completely black images)

Using BF16 precision with batchsize = 4 for testing, the memory usage is shown in the table below:

Resolution	enable_model_cpu_offload OFF	enable_model_cpu_offload ON	enable_model_cpu_offload ON Text Encoder 4bit
512 * 512	33GB	20GB	13G
1280 * 720	35GB	20GB	13G
1024 * 1024	35GB	20GB	13G
1920 * 1280	39GB	20GB	14G

Model Metrics

We've tested on multiple benchmarks and achieved the following scores:

DPG - Bench

Model	Overall	Global	Entity	Attribute	Relation	Other
SDXL	74.65	83.27	82.43	80.91	86.76	80.41
PixArt - alpha	71.11	74.97	79.32	78.60	82.57	76.96
SD3 - Medium	84.08	87.90	91.01	88.83	80.70	88.68
DALL - E 3	83.50	90.97	89.61	88.39	90.58	89.83
Flux.1 - dev	83.79	85.80	86.79	89.98	90.04	89.90
Janus - Pro - 7B	84.19	86.90	88.90	89.40	89.32	89.48
CogView4 - 6B	85.13	83.85	90.35	91.17	91.14	87.29

GenEval

Model	Overall	Single Obj.	Two Obj.	Counting	Colors	Position	Color attribution
SDXL	0.55	0.98	0.74	0.39	0.85	0.15	0.23
PixArt - alpha	0.48	0.98	0.50	0.44	0.80	0.08	0.07
SD3 - Medium	0.74	0.99	0.94	0.72	0.89	0.33	0.60
DALL - E 3	0.67	0.96	0.87	0.47	0.83	0.43	0.45
Flux.1 - dev	0.66	0.98	0.79	0.73	0.77	0.22	0.45
Janus - Pro - 7B	0.80	0.99	0.89	0.59	0.90	0.79	0.66
CogView4 - 6B	0.73	0.99	0.86	0.66	0.79	0.48	0.58

T2I - CompBench

Model	Color	Shape	Texture	2D - Spatial	3D - Spatial	Numeracy	Non - spatial Clip	Complex 3 - in - 1
SDXL	0.5879	0.4687	0.5299	0.2133	0.3566	0.4988	0.3119	0.3237
PixArt - alpha	0.6690	0.4927	0.6477	0.2064	0.3901	0.5058	0.3197	0.3433
SD3 - Medium	0.8132	0.5885	0.7334	0.3200	0.4084	0.6174	0.3140	0.3771
DALL - E 3	0.7785	0.6205	0.7036	0.2865	0.3744	0.5880	0.3003	0.3773
Flux.1 - dev	0.7572	0.5066	0.6300	0.2700	0.3992	0.6165	0.3065	0.3628
Janus - Pro - 7B	0.5145	0.3323	0.4069	0.1566	0.2753	0.4406	0.3137	0.3806
CogView4 - 6B	0.7786	0.5880	0.6983	0.3075	0.3708	0.6626	0.3056	0.3869

Chinese Text Accuracy Evaluation

Model	Precision	Recall	F1 Score	Pick@4
Kolors	0.6094	0.1886	0.2880	0.1633
CogView4 - 6B	0.6969	0.5532	0.6168	0.3265

📚 Documentation

Citation

🌟 If you find our work helpful, please consider citing our paper and leaving valuable stars

@article{zheng2024cogview3,
  title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
  author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
  journal={arXiv preprint arXiv:2403.05121},
  year={2024}
}

📄 License

This model is released under the Apache 2.0 License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご