🚀 圖表轉表格模型
圖表轉表格模型是在論文 "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning" 中提出的,用於將圖表轉換為結構化表格。生成的表格使用 &&&
分隔行,使用 |
分隔列。該模型的底層架構是 UniChart。
🚀 快速開始
如何使用
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
model_name = "khhuang/chart-to-table"
model = VisionEncoderDecoderModel.from_pretrained(model_name).cuda()
processor = DonutProcessor.from_pretrained(model_name)
image_path = "PATH_TO_IMAGE"
input_prompt = "<data_table_generation> <s_answer>"
img = Image.open(IMAGE_PATH)
pixel_values = processor(img.convert("RGB"), random_padding=False, return_tensors="pt").pixel_values
pixel_values = pixel_values.cuda()
decoder_input_ids = processor.tokenizer(input_prompt, add_special_tokens=False, return_tensors="pt", max_length=510).input_ids.cuda()
outputs = model.generate(
pixel_values.cuda(),
decoder_input_ids=decoder_input_ids.cuda(),
max_length=model.decoder.config.max_position_embeddings,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=4,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
extracted_table = sequence.split("<s_answer>")[1].strip()
📄 許可證
本項目採用 Apache-2.0 許可證。
📚 引用
如果您使用了本模型,請引用以下論文:
@misc{huang-etal-2023-do,
title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
author = "Huang, Kung-Hsiang and
Zhou, Mingyang and
Chan, Hou Pong and
Fung, Yi R. and
Wang, Zhenhailong and
Zhang, Lingyu and
Chang, Shih-Fu and
Ji, Heng",
year={2023},
eprint={2312.10160},
archivePrefix={arXiv},
primaryClass={cs.CL}
}