TF-ID-based Open-source Object Detection Model - Free Extraction of Tables, Charts, and Title Texts from Academic Papers

TF ID Base

Developed by yifeihu

TF-ID is a series of object detection models specifically designed to extract tables and figures along with their caption texts from academic papers.

Image-to-Text

Transformers

Open Source License:MIT #Academic Paper Analysis #Table and Figure Detection #High-precision OCR

Downloads 408

Release Time : 7/10/2024

Model Overview

TF-ID is an object detection model fine-tuned based on Florence-2, used to recognize tables and figures in academic papers, supporting the extraction of bounding boxes and caption texts.

Model Features

High-precision Table/Figure Detection

Achieves a 97.29% correct output rate on the test set.

Caption Text Recognition

Can simultaneously detect the bounding boxes of tables/figures and their caption texts.

Multiple Version Options

Provides base and large model versions, as well as different versions with or without caption text recognition.

Manually Annotated Data

Training data comes from Hugging Face Daily Papers, with all bounding boxes manually annotated and verified.

Model Capabilities

Table Detection

Figure Detection

Caption Text Recognition

Academic Paper Analysis

Use Cases

Academic Research

Paper Content Analysis

Automatically extract table and figure information from papers.

Improves literature retrieval and analysis efficiency.

Knowledge Graph Construction

Provides structured data sources for academic knowledge graphs.

Enhances the retrievability of academic information.

Publishing Industry

Journal Typesetting Assistance

Automatically identify the positions of figures and tables in papers.

Simplifies the publishing process.

🚀 TF-ID: Table/Figure IDentifier for academic papers

TF-ID is a family of object detection models designed to extract tables and figures from academic papers. It offers four versions to meet different needs, all fine - tuned from the microsoft/Florence - 2 checkpoints.

✨ Features

Multiple Model Versions: Available in base and large sizes, with and without caption text extraction options.
Accurate Detection: Trained on manually - annotated data from Hugging Face Daily Papers to ensure high - quality bounding box detection for tables and figures.
Flexible Output: Depending on the model version, it can return bounding boxes with or without caption text.

📦 Installation

The installation process is mainly about setting up the necessary Python libraries. You can use the following code to load the model:

import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM 

model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)

💻 Usage Examples

Basic Usage

import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM 

model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)

prompt = "<OD>"

url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt")

generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    do_sample=False,
    num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))

print(parsed_answer)

Advanced Usage

To visualize the results, you can refer to this tutorial notebook for more details.

📚 Documentation

Model Summary

TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers created by Yifei Hu. They come in four versions:

Model	Model size	Model Description
TF-ID-base[HF]	0.23B	Extract tables/figures and their caption text
TF-ID-large[HF] (Recommended)	0.77B	Extract tables/figures and their caption text
TF-ID-base-no-caption[HF]	0.23B	Extract tables/figures without caption text
TF-ID-large-no-caption[HF] (Recommended)	0.77B	Extract tables/figures without caption text
All TF-ID models are finetuned from microsoft/Florence-2 checkpoints.

The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
TF-ID models take an image of a single paper page as the input, and return bounding boxes for all tables and figures in the given page.
TF-ID-base and TF-ID-large draw bounding boxes around tables/figures and their caption text.
TF-ID-base-no-caption and TF-ID-large-no-caption draw bounding boxes around tables/figures without their caption text.

Large models are always recommended!

image/png

Object Detection results format: {'<OD>': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} }

Training Code and Dataset

Dataset: yifeihu/TF-ID-arxiv-papers
Code: github.com/ai8hyf/TF-ID

Benchmarks

We tested the models on paper pages outside the training dataset. The papers are a subset of huggingface daily paper.

Correct output - the model draws correct bounding boxes for every table/figure in the given page.

Model	Total Images	Correct Output	Success Rate
TF-ID-base[HF]	258	251	97.29%
TF-ID-large[HF]	258	253	98.06%

Model	Total Images	Correct Output	Success Rate
TF-ID-base-no-caption[HF]	261	253	96.93%
TF-ID-large-no-caption[HF]	261	254	97.32%

Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components.

🔧 Technical Details

Model Architecture: Based on the microsoft/Florence - 2 architecture, fine - tuned for table and figure extraction in academic papers.
Training Data: Papers from Hugging Face Daily Papers, with all bounding boxes manually annotated.

📄 License

This project is licensed under the MIT License. You can find the full license text here.

BibTex and citation info

@misc{TF-ID,
  author = {Yifei Hu},
  title = {TF-ID: Table/Figure IDentifier for academic papers},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご