LiLT-Document-QA開源模型 - 免費部署處理英文文檔問答任務

首頁

Lilt Document QA

由TusharGoel開發

LiLT是一個基於文檔視覺問答（DocVQA）任務的預訓練模型，專門用於處理英文文檔中的問答任務。

圖像生成文本

Transformers

英語開源協議:MIT #文檔問答 #英文文檔處理 #OCR增強理解

下載量 80

發布時間 : 10/15/2023

模型概述

LiLT模型通過結合文本和佈局信息，能夠理解文檔結構並回答相關問題，特別適用於表單、發票等結構化文檔的問答場景。

模型特點

多模態理解

同時處理文本內容和文檔佈局信息，增強對結構化文檔的理解能力

文檔結構感知

通過邊界框信息捕捉文檔元素的空間關係

英文文檔優化

專門針對英文文檔問答任務進行微調

模型能力

文檔問答

結構化信息提取

表單理解

使用案例

文檔處理

表單信息提取

從結構化表單中提取特定字段信息

可準確識別表單中的關鍵信息如許可證編號、日期等

發票處理

回答關於發票內容的特定問題

可定位發票中的金額、供應商等信息

🚀 LiLT文檔問答模型

LiLT模型是用於文檔問答任務的模型，本模型在英文DocVQA數據集上進行了微調，能有效處理文檔相關的問答需求。

🚀 快速開始

你可以按照以下步驟使用LiLT模型進行文檔問答：

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from datasets import load_dataset

model_checkpoint = "TusharGoel/LiLT-Document-QA"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, add_prefix_space=True)
model_predict = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)

model_predict.eval()
dataset = load_dataset("nielsr/funsd", split="train")
example = dataset[0]
print(example)

question = "What is the Licensee Number?"
print(question)

words = example["words"]
boxes = example["bboxes"]

encoding = tokenizer(question, words, boxes = boxes, return_token_type_ids=True, return_tensors="pt")

word_ids = encoding.word_ids(0)
outputs = model_predict(**encoding)

loss = outputs.loss
start_scores = outputs.start_logits
end_scores = outputs.end_logits

start, end = word_ids[start_scores.argmax(-1).item()], word_ids[end_scores.argmax(-1).item()]
# print(start, end)
print(" ".join(words[start : end + 1]))

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from datasets import load_dataset

model_checkpoint = "TusharGoel/LiLT-Document-QA"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, add_prefix_space=True)
model_predict = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)

model_predict.eval()
dataset = load_dataset("nielsr/funsd", split="train")
example = dataset[0]
print(example)

question = "What is the Licensee Number?"
print(question)

words = example["words"]
boxes = example["bboxes"]

encoding = tokenizer(question, words, boxes = boxes, return_token_type_ids=True, return_tensors="pt")

word_ids = encoding.word_ids(0)
outputs = model_predict(**encoding)

loss = outputs.loss
start_scores = outputs.start_logits
end_scores = outputs.end_logits

start, end = word_ids[start_scores.argmax(-1).item()], word_ids[end_scores.argmax(-1).item()]
# print(start, end)
print(" ".join(words[start : end + 1]))