🚀 LeX-Art: 基于可扩展高质量数据合成重新思考文本生成
LeX-Art是一套用于高质量文本图像合成的综合套件,它系统地弥合了提示表达与文本渲染保真度之间的差距。本项目基于Deepseek - R1构建了高质量的数据合成管道,精心策划了包含10000张高分辨率、美学精细的1024×1024图像的LeX - 10K数据集。同时开发了LeX - Enhancer提示增强模型,并训练了两个文本到图像的模型LeX - FLUX和LeX - Lumina,在文本渲染性能上达到了先进水平。
🚀 快速开始
本仓库包含了论文 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis 中提出的模型。
论文摘要
我们推出了LeX - Art,这是一套用于高质量文本图像合成的综合套件,它系统地弥合了提示表达与文本渲染保真度之间的差距。我们的方法遵循以数据为中心的范式,基于Deepseek - R1构建了一个高质量的数据合成管道,精心策划了LeX - 10K数据集,该数据集包含10000张高分辨率、美学精细的1024×1024图像。除了数据集构建之外,我们还开发了LeX - Enhancer,这是一个强大的提示增强模型,并训练了两个文本到图像的模型LeX - FLUX和LeX - Lumina,实现了最先进的文本渲染性能。为了系统地评估视觉文本生成,我们引入了LeX - Bench基准测试,该基准测试评估保真度、美学和对齐度,并辅以成对归一化编辑距离(PNED),这是一种用于稳健文本准确性评估的新指标。实验显示出显著的改进,LeX - Lumina的PNED提高了22.16%,LeX - FLUX在颜色(+10.32%)、位置(+5.60%)和字体准确性(+5.63%)方面优于基线。代码、模型、数据集和演示均公开可用。

💻 使用示例
基础用法
import torch
from diffusers import Lumina2Pipeline
pipe = Lumina2Pipeline.from_pretrained("X-ART/LeX-Lumina", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=4.0,
num_inference_steps=50,
cfg_trunc_ratio=1,
cfg_normalization=True,
generator=torch.Generator("cpu").manual_seed(0),
system_prompt="You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.",
).images[0]
image.save("lex_lumina_demo.png")
📚 详细文档
相关链接
BibTeX引用
@article{zhao2025lexart,
title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
journal={arXiv preprint arXiv:2503.21749},
year={2025}
}
📦 模型信息
属性 |
详情 |
基础模型 |
Alpha-VLLM/Lumina-Image-2.0 |
数据集 |
X-ART/LeX-10K |
库名称 |
diffusers |
许可证 |
mit |
管道标签 |
text-to-image |
标签 |
art、text-rendering |