🚀 LeX-Art: 基於可擴展高質量數據合成重新思考文本生成
LeX-Art是一套用於高質量文本圖像合成的綜合套件,它系統地彌合了提示表達與文本渲染保真度之間的差距。本項目基於Deepseek - R1構建了高質量的數據合成管道,精心策劃了包含10000張高分辨率、美學精細的1024×1024圖像的LeX - 10K數據集。同時開發了LeX - Enhancer提示增強模型,並訓練了兩個文本到圖像的模型LeX - FLUX和LeX - Lumina,在文本渲染性能上達到了先進水平。
🚀 快速開始
本倉庫包含了論文 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis 中提出的模型。
論文摘要
我們推出了LeX - Art,這是一套用於高質量文本圖像合成的綜合套件,它系統地彌合了提示表達與文本渲染保真度之間的差距。我們的方法遵循以數據為中心的範式,基於Deepseek - R1構建了一個高質量的數據合成管道,精心策劃了LeX - 10K數據集,該數據集包含10000張高分辨率、美學精細的1024×1024圖像。除了數據集構建之外,我們還開發了LeX - Enhancer,這是一個強大的提示增強模型,並訓練了兩個文本到圖像的模型LeX - FLUX和LeX - Lumina,實現了最先進的文本渲染性能。為了系統地評估視覺文本生成,我們引入了LeX - Bench基準測試,該基準測試評估保真度、美學和對齊度,並輔以成對歸一化編輯距離(PNED),這是一種用於穩健文本準確性評估的新指標。實驗顯示出顯著的改進,LeX - Lumina的PNED提高了22.16%,LeX - FLUX在顏色(+10.32%)、位置(+5.60%)和字體準確性(+5.63%)方面優於基線。代碼、模型、數據集和演示均公開可用。

💻 使用示例
基礎用法
import torch
from diffusers import Lumina2Pipeline
pipe = Lumina2Pipeline.from_pretrained("X-ART/LeX-Lumina", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=4.0,
num_inference_steps=50,
cfg_trunc_ratio=1,
cfg_normalization=True,
generator=torch.Generator("cpu").manual_seed(0),
system_prompt="You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.",
).images[0]
image.save("lex_lumina_demo.png")
📚 詳細文檔
相關鏈接
BibTeX引用
@article{zhao2025lexart,
title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
journal={arXiv preprint arXiv:2503.21749},
year={2025}
}
📦 模型信息
屬性 |
詳情 |
基礎模型 |
Alpha-VLLM/Lumina-Image-2.0 |
數據集 |
X-ART/LeX-10K |
庫名稱 |
diffusers |
許可證 |
mit |
管道標籤 |
text-to-image |
標籤 |
art、text-rendering |