LeX-Lumina Open-Source Image Generation Model - Free to Use, Improve Text Rendering Fidelity and Aesthetic Effects

Lex Lumina

Developed by X-ART

LeX-Lumina is a high-quality text-to-image generation model focused on enhancing text rendering fidelity and aesthetic effects.

Text-to-Image OtherOpen Source License:MIT #High-fidelity text rendering #Aesthetically optimized image generation #Industrial-grade design support

Downloads 137

Release Time : 3/25/2025

Model Overview

LeX-Lumina is a high-quality text-to-image synthesis model built on Deepseek-R1, capable of generating high-resolution, aesthetically optimized images from text prompts, with exceptional proficiency in handling complex text rendering requirements.

Model Features

High-quality text rendering

Trained on the LeX-10K dataset, achieving a 22.16% PNED improvement, significantly enhancing text rendering accuracy.

Aesthetic optimization

Generates high-resolution (1024×1024) images with carefully optimized aesthetic effects.

Powerful prompt enhancement

Built-in LeX-Enhancer prompt enhancement model for better understanding and execution of complex text prompts.

Model Capabilities

Text-to-image generation

High-resolution image generation

Complex text rendering

Aesthetic optimization

Use Cases

Artistic creation

Poster design

Generate poster images with complex typography and artistic effects.

Accurately renders text elements while maintaining overall aesthetic balance.

Brand identity

Generate visual designs incorporating brand names and iconic elements.

Outperforms baseline models in color, positioning, and font accuracy.

Advertising design

Ad banners

Generate eye-catching ad banners with promotional text and visual elements.

Combines sharp edges with organic and mechanical design elements to create visually striking effects.

🚀 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

This repository presents a model for high - quality text - image synthesis. It bridges the gap between prompt expressiveness and text rendering fidelity, offering state - of - the - art performance in text rendering tasks.

✨ Features

High - quality Dataset: Constructed LeX - 10K, a dataset of 10K high - resolution, aesthetically refined 1024$\times$1024 images based on Deepseek - R1.
Prompt Enrichment Model: Developed LeX - Enhancer, a robust prompt enrichment model.
Text - to - Image Models: Trained two text - to - image models, LeX - FLUX and LeX - Lumina, achieving excellent text rendering performance.
Evaluation Benchmark and Metric: Introduced LeX - Bench, a benchmark for evaluating fidelity, aesthetics, and alignment, along with the Pairwise Normalized Edit Distance (PNED) metric for text accuracy evaluation.

📦 Installation

The code examples assume you have torch and diffusers installed. You can install them using the following commands:

pip install torch
pip install diffusers

💻 Usage Examples

Basic Usage

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("X-ART/LeX-Lumina", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=1,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(0),
    system_prompt="You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts."
).images[0]
image.save("lex_lumina_demo.png")

📚 Documentation

The abstract of the paper is as follows:

We introduce LeX - Art, a comprehensive suite for high - quality text - image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data - centric paradigm, constructing a high - quality data synthesis pipeline based on Deepseek - R1 to curate LeX - 10K, a dataset of 10K high - resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX - Enhancer, a robust prompt enrichment model, and train two text - to - image models, LeX - FLUX and LeX - Lumina, achieving state - of - the - art text rendering performance. To systematically evaluate visual text generation, we introduce LeX - Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX - Lumina achieving a 22.16% PNED gain, and LeX - FLUX outperforming baselines in color (+10.32%), positional (+5.60%), and font accuracy (+5.63%). The codes, models, datasets, and demo are publicly available.

demo

📄 License

This project is licensed under the MIT license.

Additional Information

Property	Details
Base Model	Alpha - VLLM/Lumina - Image - 2.0
Datasets	X - ART/LeX - 10K
Library Name	diffusers
Pipeline Tag	text - to - image
Tags	art, text - rendering

References

@article{zhao2025lexart,
    title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
    author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
    journal={arXiv preprint arXiv:2503.21749},
    year={2025}
}

Links

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご