LeX-Lumina开源图像生成模型 - 免费使用，提升文本渲染保真与美学效果

首页

Lex Lumina

由 X-ART 开发

LeX-Lumina是一个高质量的文本生成图像模型，专注于提升文本渲染保真度和美学效果。

文本生成图像其他开源协议:MIT #高保真文本渲染 #美学优化图像生成 #工业级设计支持

下载量 137

发布时间 : 3/25/2025

模型简介

LeX-Lumina是基于Deepseek-R1构建的高质量文本图像合成模型，能够根据文本提示生成高分辨率、美学优化的图像，特别擅长处理复杂的文本渲染需求。

模型特点

高质量文本渲染

通过LeX-10K数据集训练，实现了22.16%的PNED提升，显著提高了文本渲染的准确性。

美学优化

生成的图像具有高分辨率（1024×1024）和精心优化的美学效果。

强大的提示增强

内置LeX-Enhancer提示增强模型，能够更好地理解和执行复杂的文本提示。

模型能力

文本生成图像

高分辨率图像生成

复杂文本渲染

美学优化

使用案例

艺术创作

海报设计

生成具有复杂排版和艺术效果的海报图像。

能够准确渲染文本元素并保持整体美学平衡。

品牌标识

生成包含品牌名称和标志性元素的视觉设计。

在颜色、位置和字体准确性上优于基线模型。

广告设计

广告横幅

生成吸引眼球的广告横幅，包含促销文本和视觉元素。

结合了锐利的棱角和有机与机械设计元素，创造出视觉强烈的效果。

🚀 LeX-Art: 基于可扩展高质量数据合成重新思考文本生成

LeX-Art是一套用于高质量文本图像合成的综合套件，它系统地弥合了提示表达与文本渲染保真度之间的差距。本项目基于Deepseek - R1构建了高质量的数据合成管道，精心策划了包含10000张高分辨率、美学精细的1024×1024图像的LeX - 10K数据集。同时开发了LeX - Enhancer提示增强模型，并训练了两个文本到图像的模型LeX - FLUX和LeX - Lumina，在文本渲染性能上达到了先进水平。

🚀 快速开始

本仓库包含了论文 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis 中提出的模型。

论文摘要

我们推出了LeX - Art，这是一套用于高质量文本图像合成的综合套件，它系统地弥合了提示表达与文本渲染保真度之间的差距。我们的方法遵循以数据为中心的范式，基于Deepseek - R1构建了一个高质量的数据合成管道，精心策划了LeX - 10K数据集，该数据集包含10000张高分辨率、美学精细的1024×1024图像。除了数据集构建之外，我们还开发了LeX - Enhancer，这是一个强大的提示增强模型，并训练了两个文本到图像的模型LeX - FLUX和LeX - Lumina，实现了最先进的文本渲染性能。为了系统地评估视觉文本生成，我们引入了LeX - Bench基准测试，该基准测试评估保真度、美学和对齐度，并辅以成对归一化编辑距离（PNED），这是一种用于稳健文本准确性评估的新指标。实验显示出显著的改进，LeX - Lumina的PNED提高了22.16%，LeX - FLUX在颜色（+10.32%）、位置（+5.60%）和字体准确性（+5.63%）方面优于基线。代码、模型、数据集和演示均公开可用。 demo

💻 使用示例

基础用法

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("X-ART/LeX-Lumina", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=1,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(0),
    system_prompt="You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.",

).images[0]
image.save("lex_lumina_demo.png")

📚 详细文档

BibTeX引用

@article{zhao2025lexart,
    title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
    author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
    journal={arXiv preprint arXiv:2503.21749},
    year={2025}
}