pix2struct-base-table2html Open-source Model - Convert Table Images to HTML Code with One Click for Free

Home

Pix2struct Base Table2html

Developed by KennethTM

A Pix2Struct-based model for converting table images to structured HTML code

Image-to-Text

Transformers

EnglishOpen Source License:MIT #Table Image to HTML #OCR Structured Recognition #Multilingual Table Processing

Downloads 104

Release Time : 9/10/2024

Model Overview

This model takes table images as input and outputs corresponding HTML code, enabling OCR and structured recognition of table images. Suitable for scenarios requiring table data extraction from images.

Model Features

Table Image Recognition

Accurately recognizes text and structure in table images

HTML Generation

Converts recognition results into structured HTML code

Multi-dataset Training

Trained on both MMTab and PubTabNet datasets for improved generalization

1024 Chunk Length

Supports up to 1024 chunk length, suitable for complex tables

Model Capabilities

Table Image Recognition

HTML Code Generation

Table Structure Parsing

Multilingual Table Processing

Use Cases

Document Digitization

PDF Table Extraction

Extract tables from PDF documents and convert them to HTML format

Generates editable HTML table code

Data Collection

Web Table Scraping

Convert tables from webpage screenshots into structured data

Obtain directly usable table data

🚀 pix2struct-base-table2html

Transform table images into HTML! This model efficiently parses table images, conducts OCR, and outputs them in HTML format.

🚀 Quick Start

You can try the demo app which combines both table detection and recognition!

✨ Features

This model takes a table image as input and outputs HTML. It parses the image, performs optical character recognition (OCR), and structure recognition to convert it into HTML format.

Input Requirement: The model expects an image containing only a table. If the table is embedded in a document, you should first use a table detection model (e.g., Microsoft's Table Transformer model) to extract it.
Fine - Tuning Details: The model is fine - tuned from the Pix2Struct base model with a max_patch_length of 1024 and a max_generation_length of 1024. For inference, the max_patch_length should generally not be changed, but the generation_length can be adjusted.
Training Datasets: The model has been trained using two datasets: MMTab and PubTabNet.

Property	Details
Model Type	pix2struct - base - table2html
Training Data	MMTab and PubTabNet

📦 Installation

Since this is a model from the transformers library, you need to install the transformers library. You can install it using the following command:

pip install transformers

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoProcessor, Pix2StructForConditionalGeneration
from PIL import Image
import requests
from io import BytesIO

# Load model and processor
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained("KennethTM/pix2struct-base-table2html")
model = Pix2StructForConditionalGeneration.from_pretrained("KennethTM/pix2struct-base-table2html")
model.to(device)
model.eval()

# Load example image from URL
url = "https://huggingface.co/KennethTM/pix2struct-base-table2html/resolve/main/example_recog_1.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content))

# Run model inference
encoding = processor(image, return_tensors="pt", max_patches=1024)
with torch.inference_mode():
    flattened_patches = encoding.pop("flattened_patches").to(device)
    attention_mask = encoding.pop("attention_mask").to(device)
    predictions = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_new_tokens=1024)

predictions_decoded = processor.tokenizer.batch_decode(predictions, skip_special_tokens=True)

# Show predictions as text
print(predictions_decoded[0])

Example Image

Model HTML Output for Example Image

<table border="1" cellspacing="0">
 <tr>
  <th>
   Rank
  </th>
  <th>
   Lane
  </th>
  <th>
   Name
  </th>
  <th>
   Nationality
  </th>
  <th>
   Time
  </th>
  <th>
   Notes
  </th>
 </tr>
 <tr>
  <td>
  </td>
  <td>
   4
  </td>
  <td>
   Michael Phelps
  </td>
  <td>
   United States
  </td>
  <td>
   51.25
  </td>
  <td>
   OR
  </td>
 </tr>
 <tr>
  <td>
  </td>
  <td>
   3
  </td>
  <td>
   Ian Crocker
  </td>
  <td>
   United States
  </td>
  <td>
   51.29
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
  </td>
  <td>
   5
  </td>
  <td>
   Andriy Serdinov
  </td>
  <td>
   Ukraine
  </td>
  <td>
   51.36
  </td>
  <td>
   EU
  </td>
 </tr>
 <tr>
  <td>
   4
  </td>
  <td>
   1
  </td>
  <td>
   Thomas Rupprath
  </td>
  <td>
   Germany
  </td>
  <td>
   52.27
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   5
  </td>
  <td>
   6
  </td>
  <td>
   Igor Marchenko
  </td>
  <td>
   Russia
  </td>
  <td>
   52.32
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   6
  </td>
  <td>
   2
  </td>
  <td>
   Gabriel Mangabeira
  </td>
  <td>
   Brazil
  </td>
  <td>
   52.34
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   7
  </td>
  <td>
   8
  </td>
  <td>
   Duje Draganja
  </td>
  <td>
   Croatia
  </td>
  <td>
   52.46
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   8
  </td>
  <td>
   7
  </td>
  <td>
   Geoff Huegill
  </td>
  <td>
   Australia
  </td>
  <td>
   52.56
  </td>
  <td>
  </td>
 </tr>
</table>

Rendered HTML Table

Rank	Lane	Name	Nationality	Time	Notes
	4	Michael Phelps	United States	51.25	OR
	3	Ian Crocker	United States	51.29
	5	Andriy Serdinov	Ukraine	51.36	EU
4	1	Thomas Rupprath	Germany	52.27
5	6	Igor Marchenko	Russia	52.32
6	2	Gabriel Mangabeira	Brazil	52.34
7	8	Duje Draganja	Croatia	52.46
8	7	Geoff Huegill	Australia	52.56

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご