LeX-Luminaオープンソース画像生成モデル - 無料で利用可能、テキストレンダリングの忠実度と美的効果を向上させる

ホーム

Lex Lumina

X-ARTによって開発

LeX-Luminaは高品質なテキストから画像を生成するモデルで、テキストレンダリングの忠実度と美的効果の向上に焦点を当てています。

テキスト生成画像その他オープンソースライセンス:MIT #高忠実度テキストレンダリング #美学的に最適化された画像生成 #工業デザインサポート

ダウンロード数 137

リリース時間 : 3/25/2025

モデル概要

LeX-LuminaはDeepseek-R1をベースに構築された高品質なテキスト画像合成モデルで、テキストプロンプトに基づいて高解像度で美学的に最適化された画像を生成でき、特に複雑なテキストレンダリングのニーズに対応するのに優れています。

モデル特徴

高品質テキストレンダリング

LeX-10Kデータセットでトレーニングされ、22.16%のPNED向上を実現し、テキストレンダリングの精度が大幅に向上しました。

美学的最適化

生成される画像は高解像度（1024×1024）で、美学的に最適化されています。

強力なプロンプト拡張

組み込みのLeX-Enhancerプロンプト拡張モデルにより、複雑なテキストプロンプトをより良く理解し実行できます。

モデル能力

テキストから画像生成

高解像度画像生成

複雑なテキストレンダリング

美学的最適化

使用事例

アート創作

ポスターデザイン

複雑なレイアウトと芸術的効果を持つポスター画像を生成します。

テキスト要素を正確にレンダリングし、全体的な美的バランスを保つことができます。

ブランドロゴ

ブランド名と象徴的な要素を含むビジュアルデザインを生成します。

色、位置、フォントの正確性においてベースラインモデルを上回ります。

広告デザイン

広告バナー

目を引く広告バナーを生成し、プロモーションテキストとビジュアル要素を含みます。

鋭い角と有機的・機械的デザイン要素を組み合わせ、視覚的に強い効果を生み出します。

🚀 LeX-Art: スケーラブルな高品質データ合成によるテキスト生成の再考

このリポジトリには、論文 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis で提示されたモデルが含まれています。本論文は、高品質なテキスト - 画像合成のための包括的なアプローチを提案し、プロンプトの表現力とテキストレンダリングの忠実度のギャップを埋めることを目指しています。

✨ 主な機能

高品質なデータ合成パイプラインを構築し、10Kの高解像度で審美的に洗練された1024×1024の画像データセットLeX - 10Kを作成。
強力なプロンプト拡張モデルLeX - Enhancerを開発し、2つのテキスト - 画像モデルLeX - FLUXとLeX - Luminaを訓練。
視覚的なテキスト生成を体系的に評価するためのベンチマークLeX - Benchと、新しい評価指標Pairwise Normalized Edit Distance (PNED) を導入。

📦 インストール

このモデルを使用するには、diffusers ライブラリが必要です。以下の情報を参考にしてください。

プロパティ	詳細
ベースモデル	Alpha-VLLM/Lumina-Image-2.0
データセット	X-ART/LeX-10K
ライブラリ名	diffusers
パイプラインタグ	text-to-image
タグ	art, text-rendering
ライセンス	mit

💻 使用例

基本的な使用法

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("X-ART/LeX-Lumina", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=1,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(0),
    system_prompt="You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.",

).images[0]
image.save("lex_lumina_demo.png")

📚 ドキュメント

論文の概要は以下の通りです。

我々は、高品質なテキスト - 画像合成のための包括的なツールキットであるLeX - Artを紹介します。これは、プロンプトの表現力とテキストレンダリングの忠実度の間のギャップを体系的に埋めるものです。我々のアプローチはデータ中心のパラダイムに基づいており、Deepseek - R1に基づく高品質なデータ合成パイプラインを構築し、10Kの高解像度で審美的に洗練された1024×1024の画像データセットLeX - 10Kを作成しました。データセットの構築に加えて、強力なプロンプト拡張モデルLeX - Enhancerを開発し、2つのテキスト - 画像モデルLeX - FLUXとLeX - Luminaを訓練し、最先端のテキストレンダリング性能を達成しました。視覚的なテキスト生成を体系的に評価するために、我々はLeX - Benchというベンチマークを導入しました。これは、忠実度、審美性、およびアラインメントを評価し、新しい指標であるPairwise Normalized Edit Distance (PNED) を用いて、堅牢なテキスト精度評価を行います。実験により、大幅な改善が示されており、LeX - Luminaは22.16％のPNEDの向上を達成し、LeX - FLUXは色（+10.32％）、位置（+5.60％）、およびフォント精度（+5.63％）でベースラインを上回っています。コード、モデル、データセット、およびデモは公開されています。 demo

引用情報

@article{zhao2025lexart,
    title={LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis},
    author={Zhao, Shitian and Wu, Qilong and Li, Xinyue and Zhang, Bo and Li, Ming and Qin, Qi and Liu, Dongyang and Zhang, Kaipeng and Li, Hongsheng and Qiao, Yu and Gao, Peng and Fu, Bin and Li, Zhen},
    journal={arXiv preprint arXiv:2503.21749},
    year={2025}
}