TFT-ID-1.0 Open-source Academic Paper Detection Tool - Precise Detection of Tables, Charts, and Text Areas

TFT ID 1.0

Developed by yifeihu

TFT-ID is a fine-tuned object detection model specifically designed for detecting tables, figures, and text regions in academic papers, based on Florence-2 fine-tuning

Image-to-Text

Transformers

Open Source License:MIT #Academic Paper Analysis #Table and Figure Detection #Text Region Recognition

Downloads 153

Release Time : 7/25/2024

Model Overview

This model can identify tables, figures, and text regions in academic paper pages, outputting bounding box information. Text regions can be directly fed into OCR processes

Model Features

High-Precision Detection

Achieves a 98.84% success rate in table/figure recognition tasks

Multi-Region Recognition

Simultaneously detects tables, figures, and text regions

Manually Annotated Data

Training data includes over 36,000 manually annotated and verified bounding boxes

OCR Integration

Text regions can be directly fed into OCR processes, with the TB-OCR-preview-0.1 model recommended

Model Capabilities

Academic paper image analysis

Table detection

Figure detection

Text region detection

Bounding box output

Use Cases

Academic Research

Paper Content Analysis

Automatically identifies tables, figures, and text regions in papers

Helps researchers quickly locate and extract key information from papers

Literature Digitization

Converts paper or PDF documents into structured digital content

Improves literature processing efficiency for subsequent analysis and retrieval

Publishing Industry

Journal Layout Verification

Automatically checks if the positions of figures and tables in papers meet publishing requirements

Reduces manual inspection workload and improves publishing efficiency

🚀 TFT-ID: Table/Figure/Text IDentifier for academic papers

TFT-ID is an object detection model designed to extract tables, figures, and text sections from academic papers. It offers accurate identification and can be integrated with OCR workflows for text extraction.

🚀 Quick Start

Use the following code to start using the TFT-ID model. For non-CUDA environments, refer to this post for a simple patch.

import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM 

model = AutoModelForCausalLM.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)

prompt = "<OD>"

url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt")

generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    do_sample=False,
    num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))

print(parsed_answer)

To visualize the results, check this tutorial notebook for more details.

✨ Features

Object Detection: Extract tables, figures, and text sections from academic paper images.
Manual Annotation: All 36,000+ bounding boxes are manually annotated and checked.
OCR Compatibility: The text sections are suitable for downstream OCR workflows.

📚 Documentation

Model Summary

TFT-ID (Table/Figure/Text IDentifier) is an object detection model finetuned to extract tables, figures, and text sections in academic papers created by Yifei Hu.

image/png

TFT-ID is finetuned from microsoft/Florence-2 checkpoints.

The model was finetuned with papers from Hugging Face Daily Papers. All 36,000+ bounding boxes are manually annotated and checked by Yifei Hu.
TFT-ID model takes an image of a single paper page as the input, and return bounding boxes for all tables, figures, and text sections in the given page.
The text sections contain clean text content perfect for downstream OCR workflows. I recommend using TB-OCR-preview-0.1 [HF] as the OCR model to convert the text sections into clean markdown and math latex output.

Object Detection results format: {'<OD>': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} }

Training Code and Dataset

Dataset: Coming soon.
Code: github.com/ai8hyf/TF-ID

Benchmarks

The model was tested on paper pages outside the training dataset. The papers are a subset of huggingface daily paper.

Correct output - the model draws correct bounding boxes for every table/figure/text section in the given page and does not missing any content.

Task 1: Table, Figure, and Text Section Identification

Model	Total Images	Correct Output	Success Rate
TFT-ID-1.0[HF]	373	361	96.78%

Task 2: Table and Figure Identification

Model	Total Images	Correct Output	Success Rate
TFT-ID-1.0[HF]	258	255	98.84%
TF-ID-large[HF]	258	253	98.06%

Note: Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components.

BibTex and citation info

@misc{TF-ID,
  author = {Yifei Hu},
  title = {TF-ID: Table/Figure IDentifier for academic papers},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
}

📄 License

This project is licensed under the MIT License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご