🚀 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
This repository presents a model for high - quality text - image synthesis. It bridges the gap between prompt expressiveness and text rendering fidelity, offering state - of - the - art performance in text rendering tasks.
✨ Features
- High - quality Dataset: Constructed LeX - 10K, a dataset of 10K high - resolution, aesthetically refined 1024$\times$1024 images based on Deepseek - R1.
- Prompt Enrichment Model: Developed LeX - Enhancer, a robust prompt enrichment model.
- Text - to - Image Models: Trained two text - to - image models, LeX - FLUX and LeX - Lumina, achieving excellent text rendering performance.
- Evaluation Benchmark and Metric: Introduced LeX - Bench, a benchmark for evaluating fidelity, aesthetics, and alignment, along with the Pairwise Normalized Edit Distance (PNED) metric for text accuracy evaluation.
📦 Installation
The code examples assume you have torch
and diffusers
installed. You can install them using the following commands:
pip install torch
pip install diffusers
💻 Usage Examples
Basic Usage
import torch
from diffusers import Lumina2Pipeline
pipe = Lumina2Pipeline.from_pretrained("X-ART/LeX-Lumina", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=4.0,
num_inference_steps=50,
cfg_trunc_ratio=1,
cfg_normalization=True,
generator=torch.Generator("cpu").manual_seed(0),
system_prompt="You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts."
).images[0]
image.save("lex_lumina_demo.png")
📚 Documentation
The abstract of the paper is as follows:
We introduce LeX - Art, a comprehensive suite for high - quality text - image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data - centric paradigm, constructing a high - quality data synthesis pipeline based on Deepseek - R1 to curate LeX - 10K, a dataset of 10K high - resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX - Enhancer, a robust prompt enrichment model, and train two text - to - image models, LeX - FLUX and LeX - Lumina, achieving state - of - the - art text rendering performance. To systematically evaluate visual text generation, we introduce LeX - Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX - Lumina achieving a 22.16% PNED gain, and LeX - FLUX outperforming baselines in color (+10.32%), positional (+5.60%), and font accuracy (+5.63%). The codes, models, datasets, and demo are publicly available.

📄 License
This project is licensed under the MIT license.
Additional Information
Property |
Details |
Base Model |
Alpha - VLLM/Lumina - Image - 2.0 |
Datasets |
X - ART/LeX - 10K |
Library Name |
diffusers |
Pipeline Tag |
text - to - image |
Tags |
art, text - rendering |
References
@article{zhao2025lexart,
title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
journal={arXiv preprint arXiv:2503.21749},
year={2025}
}
Links