CogView4 - 6B 文生图モデルオープンソース化 - 中英両言語対応、無料で高品質な画像を生成！

ホーム

Cogview4 6B

THUDMによって開発

CogView4-6BはGLM-4-9Bベースモデルを基にしたテキスト生成画像モデルで、中国語と英語をサポートし、高品質な画像を生成できます。

テキスト生成画像複数言語対応オープンソースライセンス:Apache-2.0 #高解像度テキスト生成画像 #中国語正確生成 #多次元評価リーダー

ダウンロード数 333.85k

リリース時間 : 3/3/2025

モデル概要

CogView4-6Bは高性能なテキスト生成画像モデルで、テキストプロンプトに基づき高品質な画像を生成し、多様な解像度と複雑なテキスト記述をサポートします。

モデル特徴

高解像度サポート

512pxから2048pxまでの幅高さをサポートし、総ピクセル数は2^21ピクセルを超えません。

低VRAM最適化

モデルCPUオフロードとテキストエンコーダ4bit量子化技術により、VRAM使用量を大幅に削減。

中国語テキスト高精度

中国語テキスト精度評価でF1値0.6168を達成し、他モデルを大幅に上回ります。

モデル能力

テキストから画像生成

高解像度画像生成

多言語サポート（中国語、英語）

使用事例

クリエイティブデザイン

スポーツカー設計

詳細なテキスト記述に基づき高品質なスポーツカー画像を生成。

生成画像は高度なディテールとリアリティを持ち、テキスト記述の属性を正確に再現できます。

広告・マーケティング

製品展示

製品説明に基づき魅力的な製品展示画像を生成。

生成画像は製品の主要特性とセールスポイントを効果的に強調できます。

🚀 CogView4-6B

CogView4-6Bは、テキストから画像を生成するモデルです。高解像度の画像生成が可能で、複数のベンチマークで高いスコアを達成しています。

🤗 Space | 🌐 Github | 📜 CogView3 Paper

🚀 クイックスタート

まず、diffusersライブラリをソースからインストールします。

pip install git+https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .

次に、以下のコードを実行します。

from diffusers import CogView4Pipeline

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)

# Open it for reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview4.png")

✨ 主な機能

推論要件とモデル紹介

解像度: 幅と高さは512pxから2048pxの間で、32で割り切れる必要があり、最大ピクセル数が2^21pxを超えないようにします。
精度: BF16 / FP32 (FP16はオーバーフローを引き起こし、完全に黒い画像になるためサポートされていません)

BF16精度でbatchsize=4を使用してテストした場合のメモリ使用量は、以下の表の通りです。

解像度	enable_model_cpu_offload OFF	enable_model_cpu_offload ON	enable_model_cpu_offload ON Text Encoder 4bit
512 * 512	33GB	20GB	13G
1280 * 720	35GB	20GB	13G
1024 * 1024	35GB	20GB	13G
1920 * 1280	39GB	20GB	14G

モデルの評価指標

複数のベンチマークでテストを行い、以下のスコアを達成しています。

DPG-Bench

モデル	全体	グローバル	エンティティ	属性	関係	その他
SDXL	74.65	83.27	82.43	80.91	86.76	80.41
PixArt-alpha	71.11	74.97	79.32	78.60	82.57	76.96
SD3-Medium	84.08	87.90	91.01	88.83	80.70	88.68
DALL-E 3	83.50	90.97	89.61	88.39	90.58	89.83
Flux.1-dev	83.79	85.80	86.79	89.98	90.04	89.90
Janus-Pro-7B	84.19	86.90	88.90	89.40	89.32	89.48
CogView4-6B	85.13	83.85	90.35	91.17	91.14	87.29

GenEval

モデル	全体	単一オブジェクト	2つのオブジェクト	カウント	色	位置	色属性
SDXL	0.55	0.98	0.74	0.39	0.85	0.15	0.23
PixArt-alpha	0.48	0.98	0.50	0.44	0.80	0.08	0.07
SD3-Medium	0.74	0.99	0.94	0.72	0.89	0.33	0.60
DALL-E 3	0.67	0.96	0.87	0.47	0.83	0.43	0.45
Flux.1-dev	0.66	0.98	0.79	0.73	0.77	0.22	0.45
Janus-Pro-7B	0.80	0.99	0.89	0.59	0.90	0.79	0.66
CogView4-6B	0.73	0.99	0.86	0.66	0.79	0.48	0.58

T2I-CompBench

モデル	色	形状	テクスチャ	2D空間	3D空間	数値計算	非空間クリップ	複合3-in-1
SDXL	0.5879	0.4687	0.5299	0.2133	0.3566	0.4988	0.3119	0.3237
PixArt-alpha	0.6690	0.4927	0.6477	0.2064	0.3901	0.5058	0.3197	0.3433
SD3-Medium	0.8132	0.5885	0.7334	0.3200	0.4084	0.6174	0.3140	0.3771
DALL-E 3	0.7785	0.6205	0.7036	0.2865	0.3744	0.5880	0.3003	0.3773
Flux.1-dev	0.7572	0.5066	0.6300	0.2700	0.3992	0.6165	0.3065	0.3628
Janus-Pro-7B	0.5145	0.3323	0.4069	0.1566	0.2753	0.4406	0.3137	0.3806
CogView4-6B	0.7786	0.5880	0.6983	0.3075	0.3708	0.6626	0.3056	0.3869

中国語テキスト精度評価

モデル	精度	再現率	F1スコア	Pick@4
Kolors	0.6094	0.1886	0.2880	0.1633
CogView4-6B	0.6969	0.5532	0.6168	0.3265

📚 ドキュメント

引用

🌟 当社の研究が役立つと感じた場合は、論文を引用し、貴重なスターを付けていただけると幸いです。

@article{zheng2024cogview3,
  title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
  author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
  journal={arXiv preprint arXiv:2403.05121},
  year={2024}
}