LeX-Lumina開源圖像生成模型 - 免費使用，提升文本渲染保真與美學效果

首頁

Lex Lumina

由X-ART開發

LeX-Lumina是一個高質量的文本生成圖像模型，專注於提升文本渲染保真度和美學效果。

文本生成圖像其他開源協議:MIT #高保真文本渲染 #美學優化圖像生成 #工業級設計支持

下載量 137

發布時間 : 3/25/2025

模型概述

LeX-Lumina是基於Deepseek-R1構建的高質量文本圖像合成模型，能夠根據文本提示生成高分辨率、美學優化的圖像，特別擅長處理複雜的文本渲染需求。

模型特點

高質量文本渲染

通過LeX-10K數據集訓練，實現了22.16%的PNED提升，顯著提高了文本渲染的準確性。

美學優化

生成的圖像具有高分辨率（1024×1024）和精心優化的美學效果。

強大的提示增強

內置LeX-Enhancer提示增強模型，能夠更好地理解和執行復雜的文本提示。

模型能力

文本生成圖像

高分辨率圖像生成

複雜文本渲染

美學優化

使用案例

藝術創作

海報設計

生成具有複雜排版和藝術效果的海報圖像。

能夠準確渲染文本元素並保持整體美學平衡。

品牌標識

生成包含品牌名稱和標誌性元素的視覺設計。

在顏色、位置和字體準確性上優於基線模型。

廣告設計

廣告橫幅

生成吸引眼球的廣告橫幅，包含促銷文本和視覺元素。

結合了銳利的稜角和有機與機械設計元素，創造出視覺強烈的效果。

🚀 LeX-Art: 基於可擴展高質量數據合成重新思考文本生成

LeX-Art是一套用於高質量文本圖像合成的綜合套件，它系統地彌合了提示表達與文本渲染保真度之間的差距。本項目基於Deepseek - R1構建了高質量的數據合成管道，精心策劃了包含10000張高分辨率、美學精細的1024×1024圖像的LeX - 10K數據集。同時開發了LeX - Enhancer提示增強模型，並訓練了兩個文本到圖像的模型LeX - FLUX和LeX - Lumina，在文本渲染性能上達到了先進水平。

🚀 快速開始

本倉庫包含了論文 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis 中提出的模型。

論文摘要

我們推出了LeX - Art，這是一套用於高質量文本圖像合成的綜合套件，它系統地彌合了提示表達與文本渲染保真度之間的差距。我們的方法遵循以數據為中心的範式，基於Deepseek - R1構建了一個高質量的數據合成管道，精心策劃了LeX - 10K數據集，該數據集包含10000張高分辨率、美學精細的1024×1024圖像。除了數據集構建之外，我們還開發了LeX - Enhancer，這是一個強大的提示增強模型，並訓練了兩個文本到圖像的模型LeX - FLUX和LeX - Lumina，實現了最先進的文本渲染性能。為了系統地評估視覺文本生成，我們引入了LeX - Bench基準測試，該基準測試評估保真度、美學和對齊度，並輔以成對歸一化編輯距離（PNED），這是一種用於穩健文本準確性評估的新指標。實驗顯示出顯著的改進，LeX - Lumina的PNED提高了22.16%，LeX - FLUX在顏色（+10.32%）、位置（+5.60%）和字體準確性（+5.63%）方面優於基線。代碼、模型、數據集和演示均公開可用。 demo

💻 使用示例

基礎用法

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("X-ART/LeX-Lumina", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=1,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(0),
    system_prompt="You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.",

).images[0]
image.save("lex_lumina_demo.png")

📚 詳細文檔

BibTeX引用

@article{zhao2025lexart,
    title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
    author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
    journal={arXiv preprint arXiv:2503.21749},
    year={2025}
}