TextFlux Open-Source Multi-Language Scene Text Synthesis Model - Achieve High-Fidelity Text Synthesis Effect for Free

Textflux

Developed by yyyyyxie

TextFlux is a high-fidelity multilingual scene text synthesis model based on an OCR-free diffusion transformer. It uses FLUX.1-Fill-dev as the base model and focuses on the scene text synthesis task.

Image Generation #OCR-free text synthesis #Multilingual scene text #Diffusion Transformer

Downloads 284

Release Time : 4/21/2025

Model Overview

TextFlux is an OCR-free framework that uses a diffusion transformer to achieve high-fidelity multilingual scene text synthesis. This framework simplifies the learning task by directly providing visual glyph guidance through the concatenation of rendered glyphs with the scene image space, enabling the model to focus on context reasoning and visual fusion.

Model Features

OCR-free architecture

A simplified structure without an OCR encoder

High-fidelity and consistent with scene style

Accurate rendering to maintain scene style consistency

Multilingual and low-resource adaptation

Excellent cross-lingual performance, adaptable to new languages with a small amount of data (e.g., <1,000 samples)

Zero-shot generalization ability

Can render characters not seen during training

Controllable multi-line text

Supports flexible multi-line synthesis and line-level control

Data-efficient

Only requires about 1% of the data compared to other methods

Model Capabilities

Scene text synthesis

Multilingual text generation

Image and text fusion

Zero-shot character rendering

Multi-line text control

Use Cases

Scene text generation

Billboard text synthesis

Generate realistic billboard text in natural scenes

High-fidelity text consistent with scene style

Multilingual sign generation

Generate scene signs containing multiple languages

Supports rendering of characters not seen during training

🚀 TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis

TextFlux is an OCR-free framework that uses a Diffusion Transformer (DiT, based on FLUX.1-Fill-dev) for high-fidelity multilingual scene text synthesis. It simplifies the learning process by directly providing visual glyph guidance through spatial concatenation of rendered glyphs with the scene image, allowing the model to focus on contextual reasoning and visual fusion.

✨ Features

OCR-Free: Simplified architecture without OCR encoders.
High-Fidelity & Contextual Styles: Precise rendering, stylistically consistent with scenes.
Multilingual & Low-Resource: Strong performance across languages, adapts to new languages with minimal data (e.g., <1,000 samples).
Zero-Shot Generalization: Renders characters unseen during training.
Controllable Multi-Line Text: Flexible multi-line synthesis with line-level control.
Data Efficient: Uses a fraction of data (e.g., ~1%) compared to other methods.

🚀 Quick Start

Setup

Clone/Download: Get the necessary code and model weights.
Dependencies:

conda create -n textflux python==3.11.4 -y
conda activate textflux
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt
# Ensure diffusers >= 0.32.1

Gradio Demo

Provides "Normal Mode" (for pre-combined inputs) and "Custom Mode" (upload scene, draw masks, input text for automatic template generation and concatenation).

python demo.py

📚 Documentation

Updates

2025/05/27: Our Full-Param Weights and LoRA Weights are now available 🤗!
2025/05/25: Our Paper on ArXiv is available 🥳!

Acknowledgement

Our code is modified based on Diffusers. We adopt black-forest-labs/FLUX.1-Fill-dev as the base model. Thanks to all the contributors for the helpful discussions!

📄 License

The use of this model, TextFlux, is governed by the FLUX.1 [dev] Non-Commercial License Agreement (or the specific version applicable to FLUX.1-Fill-dev, upon which TextFlux is based).

📚 Citation

@misc{xie2025textfluxocrfreeditmodel,
      title={TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis}, 
      author={Yu Xie and Jielei Zhang and Pengyu Chen and Ziyue Wang and Weihang Wang and Longwen Gao and Peiyi Li and Huyang Sun and Qiang Zhang and Qian Qiao and Jiaqing Fan and Zhouhui Lian},
      year={2025},
      eprint={2505.17778},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.17778}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご