Open-source Document-qa-model: Document Question Answering Model - Understand Documents and Answer Questions with OCR Data

Document Qa Model

Developed by lakshya-rawat

A document Q&A model fine-tuned based on LayoutLMv3-base, capable of understanding documents using OCR data and answering related questions.

Text-to-Image

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Document Layout Q&A #Multilingual OCR Understanding #Structured Information Extraction

Downloads 54

Release Time : 4/19/2025

Model Overview

This model is trained to utilize OCR data (via PaddleOCR) to understand documents and accurately answer questions related to structured information in document layouts.

Model Features

Multilingual Support

Supports document Q&A in English, Spanish, French, German, and Italian.

Layout Awareness

Capable of understanding document layouts and structures to improve Q&A accuracy.

OCR Integration

Enhances document comprehension by combining text and positional information extracted via PaddleOCR.

Model Capabilities

Document Image Q&A

Text Information Extraction

Structured Query Answering

Use Cases

Document Processing

Utility Bill Parsing

Extracts and answers questions about fees, dates, etc., from utility bill images.

High accuracy in extracting fee and date information.

Invoice Information Extraction

Extracts vendor, amount, and product information from invoice images.

Structured output of vendor and amount information.

🚀 Document QA Model

This is a fine - tuned document question - answering model based on layoutlmv3 - base. It uses OCR data (via PaddleOCR) to understand documents and accurately answer questions about structured information in the document layout.

✨ Features

Fine - tuned on layoutlmv3 - base for document question - answering.
Supports multiple languages including English, Spanish, French, German, and Italian.
Can extract answers from scanned documents using OCR and layout - aware understanding.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Model Description

Property	Details
Model Name	`document - qa - model`
Base Model	[`microsoft/layoutlmv3 - base`](https://huggingface.co/microsoft/layoutlmv3 - base)
Fine - tuned by	Lakshya Singh (solo contributor)
Languages	English, Spanish, French, German, Italian
License	Apache - 2.0 (inherited from base model)
Intended Use	Extract answers to structured queries from scanned documents
Funding	Not funded (completed independently)

Model Sources

Repository: [Github Link](https://github.com/Lakshyasinghrawat12/DocumentQA - lakshya - rawat - document - qa - model)
Trained on: Adapted version of nielsr/docvqa_1200_examples
Model metrics: See ![training_history.png](https://cdn - uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

Uses

Direct Use

This model can be used for:

Question Answering on document images (PDFs, invoices, utility bills)
Information extraction tasks using OCR and layout - aware understanding

Out - of - Scope Use

Not suitable for conversational QA
Not suitable for images with no OCR - processed text

Training Details

Dataset

The dataset consisted of:

Images of utility bills and documents
OCR data with bounding boxes (from PaddleOCR)
Queries in English, Spanish, and Chinese
Answer spans with match scores and positions

Training Procedure

Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
Model: LayoutLMv3 - base
Epochs: 4
Learning rate schedule: Shown in image below

Training Metrics

F1 Score (validation): ![training_history.png](https://cdn - uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)
Loss & Learning Rate Chart: ![training_history.png](https://cdn - uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

Evaluation

Metrics Used

F1 score
Match score of predicted spans
Token overlap vs ground truth

Summary

The model performs well on document - style QA tasks, especially with:

Clearly structured OCR results
Document types similar to utility bills, invoices, and forms

How to Use

Available on my [Github](https://github.com/Lakshyasinghrawat12/DocumentQA - lakshya - rawat - document - qa - model)

🔧 Technical Details

No specific technical implementation details (more than 50 words) are provided in the original document, so this section is skipped.

📄 License

The model is licensed under the Apache - 2.0 license, inherited from the base model.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご