🚀 StructEqTable-Deploy: A High-efficiency Open-source Toolkit for Table-to-Latex Transformation
StructEqTable-Deploy is a solution that converts images of Table into LaTeX/HTML/MarkDown, powered by scalable data from DocGenome benchmark.
[ Github Repo ] [ Related Paper ] [ Website ]
[ Dataset🤗 ] [ Models🤗 ] [ Demo💬 ]
🚀 Quick Start
Welcome to the official repository of StructEqTable-Deploy. This toolkit can convert table images into LaTeX/HTML/MarkDown, leveraging scalable data from the DocGenome benchmark.
✨ Features
- Large-scale Benchmark: TableX, a large-scale multi-modal table benchmark, is extracted from DocGenome benchmark for table pre-training, with over 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes.
- End-to-end Model: StructEqTable can precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering.
- Model Updates: Regularly release new models with enhanced performance, such as improved recognition stability, inference speed, and robustness.
📦 Installation
conda create -n structeqtable python>=3.10
conda activate structeqtable
git clone https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git
cd StructEqTable-Deploy
python setup develop
pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
pip install struct-eqtable==0.3.0
📚 Documentation
Overview
Tables are an effective way to represent structured data in various scenarios. However, extracting tabular data from table images and performing downstream reasoning tasks is challenging due to complex column and row headers with spanning cell operations. StructEqTable-Deploy addresses these challenges by providing a solution based on large-scale data and an end-to-end model.
Changelog
- [2024/12/12] 🔥 We have released the latest model StructTable-InternVL2-1B v0.2 with enhanced recognition stability for HTML and Markdown formats!
- [2024/10/19] We have released our latest model StructTable-InternVL2-1B! Thanks to IntenrVL2's powerful foundational capabilities and fine-tuning on synthetic tabular data and the DocGenome dataset, StructTable can convert table images into various common table formats, including LaTeX, HTML, and Markdown. Moreover, the inference speed has been significantly improved compared to the v0.2 version.
- [2024/8/22] We have released our StructTable-base-v0.2, fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.
- [2024/8/08] We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
- [2024/7/30] We have released the first version of StructEqTable.
TODO
- [x] Release inference code and checkpoints of StructEqTable.
- [x] Support the Chinese version of StructEqTable.
- [x] Accelerated version of StructEqTable using TensorRT-LLM.
- [x] Expand more domains of table images to improve the model's general capabilities.
- [x] Efficient inference of StructTable-InternVL2-1B by LMDeploy Toolkit.
- [ ] Release our table pre-training and fine-tuning code
Model Zoo
Quick Demo
cd tools/demo
python demo.py \
--image_path ./demo.png \
--ckpt_path U4R/StructTable-InternVL2-1B \
--output_format latex
- HTML or Markdown format output (Only Supported by StructTable-InternVL2-1B)
python demo.py \
--image_path ./demo.png \
--ckpt_path U4R/StructTable-InternVL2-1B \
--output_format html markdown
Efficient Inference
pip install lmdeploy
cd tools/demo
python demo.py \
--image_path ./demo.png \
--ckpt_path U4R/StructTable-InternVL2-1B \
--output_format latex \
--lmdeploy
- Visualization Result
You can copy the output LaTeX code into demo.tex, then use Overleaf for table visualization.

🔧 Technical Details
Table extraction from visual images is challenging due to complex column and row headers with spanning cell operations. StructEqTable-Deploy addresses these challenges by using a large-scale multi-modal table benchmark, TableX, and an end-to-end model, StructEqTable. TableX is extracted from DocGenome benchmark and contains over 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. StructEqTable can precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks.
📄 License
StructEqTable is released under the Apache License 2.0
Acknowledgements
- DocGenome. An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models.
- ChartVLM. A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
- Pix2Struct. Screenshot Parsing as Pretraining for Visual Language Understanding.
- InternVL Family. A Series of Powerful Foundational Vision-Language Models.
- LMDeploy. A toolkit for compressing, deploying, and serving LLM and MLLM.
- UniMERNet. A Universal Network for Real-World Mathematical Expression Recognition.
- Donut. The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
- Nougat. Data Augmentation follows Nougat.
- TensorRT-LLM. Model inference acceleration uses TensorRT-LLM.
Citation
If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)
@article{xia2024docgenome,
title={DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models},
author={Xia, Renqiu and Mao, Song and Yan, Xiangchao and Zhou, Hongbin and Zhang, Bo and Peng, Haoyang and Pi, Jiahao and Fu, Daocheng and Wu, Wenjie and Ye, Hancheng and others},
journal={arXiv preprint arXiv:2406.11633},
year={2024}
}
Contact Us
If you encounter any issues or have questions, please feel free to contact us via zhouhongbin@pjlab.org.cn.