StructTable-InternVL2-1B Open-Source Multimodal Table Recognition Model - Free Conversion of Table Images to LaTeX/HTML/Markdown Formats

Structtable InternVL2 1B

Developed by U4R

A multimodal table recognition model based on InternVL2-1B, supporting conversion of table images to LaTeX/HTML/Markdown formats

Image-to-Text

Safetensors

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Table image to LaTeX #Multi-format table recognition #Efficient inference

Downloads 1,833

Release Time : 10/18/2024

Model Overview

This model can accurately extract structured data representations from visual table images, supporting multiple table format conversions and table-related reasoning tasks

Model Features

Multi-format output

Supports converting table images to three common formats: LaTeX, HTML, and Markdown

Efficient inference

Significant inference speed improvement achieved through optimization

Large-scale training data

Trained on DocGenome benchmark and synthetic data, containing over 2 million high-quality image-LaTeX pairs

Cross-domain applicability

Covers table data from 156 subject categories, with broad applicability

Model Capabilities

Table image recognition

Table structure extraction

LaTeX generation

HTML generation

Markdown generation

Table question answering

Use Cases

Academic publishing

Paper table conversion

Convert scanned paper tables into editable LaTeX format

Improves academic writing efficiency and reduces manual input errors

Enterprise applications

Financial statement processing

Automatically recognize financial statement images and convert them into structured data

Simplifies financial data digitization process

Web development

Web table reconstruction

Convert tables from design drafts into HTML code

Accelerates front-end development workflow

🚀 StructEqTable-Deploy: A High-efficiency Open-source Toolkit for Table-to-Latex Transformation

StructEqTable-Deploy is a solution that converts images of Table into LaTeX/HTML/MarkDown, powered by scalable data from DocGenome benchmark.

[ Github Repo ] [ Related Paper ] [ Website ]

[ Dataset🤗 ] [ Models🤗 ] [ Demo💬 ]

🚀 Quick Start

Welcome to the official repository of StructEqTable-Deploy. This toolkit can convert table images into LaTeX/HTML/MarkDown, leveraging scalable data from the DocGenome benchmark.

✨ Features

Large-scale Benchmark: TableX, a large-scale multi-modal table benchmark, is extracted from DocGenome benchmark for table pre-training, with over 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes.
End-to-end Model: StructEqTable can precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering.
Model Updates: Regularly release new models with enhanced performance, such as improved recognition stability, inference speed, and robustness.

📦 Installation

conda create -n structeqtable python>=3.10
conda activate structeqtable

# Install from Source code  (Suggested)
git clone https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git
cd StructEqTable-Deploy
python setup develop

# or Install from Github repo
pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"

# or Install from PyPI
pip install struct-eqtable==0.3.0

📚 Documentation

Overview

Tables are an effective way to represent structured data in various scenarios. However, extracting tabular data from table images and performing downstream reasoning tasks is challenging due to complex column and row headers with spanning cell operations. StructEqTable-Deploy addresses these challenges by providing a solution based on large-scale data and an end-to-end model.

Changelog

[2024/12/12] 🔥 We have released the latest model StructTable-InternVL2-1B v0.2 with enhanced recognition stability for HTML and Markdown formats!
[2024/10/19] We have released our latest model StructTable-InternVL2-1B! Thanks to IntenrVL2's powerful foundational capabilities and fine-tuning on synthetic tabular data and the DocGenome dataset, StructTable can convert table images into various common table formats, including LaTeX, HTML, and Markdown. Moreover, the inference speed has been significantly improved compared to the v0.2 version.
[2024/8/22] We have released our StructTable-base-v0.2, fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.
[2024/8/08] We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
[2024/7/30] We have released the first version of StructEqTable.

TODO

[x] Release inference code and checkpoints of StructEqTable.
[x] Support the Chinese version of StructEqTable.
[x] Accelerated version of StructEqTable using TensorRT-LLM.
[x] Expand more domains of table images to improve the model's general capabilities.
[x] Efficient inference of StructTable-InternVL2-1B by LMDeploy Toolkit.
[ ] Release our table pre-training and fine-tuning code

Model Zoo

Property	Details
Base Model	InternVL2-1B, Pix2Struct-base
Model Size	~1B, ~300M
Training Data	DocGenome and Synthetic Data, DocGenome
Data Augmentation	✔, ✔
LMDeploy	✔,
TensorRT	, ✔
HuggingFace	StructTable-InternVL2-1B v0.2, StructTable-InternVL2-1B v0.1, StructTable-base v0.2, StructTable-base v0.1

Quick Demo

Run the demo/demo.py

cd tools/demo

python demo.py \
  --image_path ./demo.png \
  --ckpt_path U4R/StructTable-InternVL2-1B \
  --output_format latex

HTML or Markdown format output (Only Supported by StructTable-InternVL2-1B)

python demo.py \
  --image_path ./demo.png \
  --ckpt_path U4R/StructTable-InternVL2-1B \
  --output_format html markdown

Efficient Inference

Install LMDeploy Toolkit

pip install lmdeploy

Run the demo/demo.py

cd tools/demo

python demo.py \
  --image_path ./demo.png \
  --ckpt_path U4R/StructTable-InternVL2-1B \
  --output_format latex \
  --lmdeploy

Visualization Result You can copy the output LaTeX code into demo.tex, then use Overleaf for table visualization.

🔧 Technical Details

Table extraction from visual images is challenging due to complex column and row headers with spanning cell operations. StructEqTable-Deploy addresses these challenges by using a large-scale multi-modal table benchmark, TableX, and an end-to-end model, StructEqTable. TableX is extracted from DocGenome benchmark and contains over 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. StructEqTable can precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks.

📄 License

StructEqTable is released under the Apache License 2.0

Acknowledgements

DocGenome. An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models.
ChartVLM. A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
Pix2Struct. Screenshot Parsing as Pretraining for Visual Language Understanding.
InternVL Family. A Series of Powerful Foundational Vision-Language Models.
LMDeploy. A toolkit for compressing, deploying, and serving LLM and MLLM.
UniMERNet. A Universal Network for Real-World Mathematical Expression Recognition.
Donut. The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
Nougat. Data Augmentation follows Nougat.
TensorRT-LLM. Model inference acceleration uses TensorRT-LLM.

Citation

If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)

@article{xia2024docgenome,
  title={DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models},
  author={Xia, Renqiu and Mao, Song and Yan, Xiangchao and Zhou, Hongbin and Zhang, Bo and Peng, Haoyang and Pi, Jiahao and Fu, Daocheng and Wu, Wenjie and Ye, Hancheng and others},
  journal={arXiv preprint arXiv:2406.11633},
  year={2024}
}

Contact Us

If you encounter any issues or have questions, please feel free to contact us via zhouhongbin@pjlab.org.cn.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご