đ Document QA Model
This is a fine - tuned document question - answering model based on layoutlmv3 - base
. It uses OCR data (via PaddleOCR) to understand documents and accurately answer questions about structured information in the document layout.
⨠Features
- Fine - tuned on
layoutlmv3 - base
for document question - answering.
- Supports multiple languages including English, Spanish, French, German, and Italian.
- Can extract answers from scanned documents using OCR and layout - aware understanding.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Model Details
Model Description
Property |
Details |
Model Name |
document - qa - model |
Base Model |
[microsoft/layoutlmv3 - base ](https://huggingface.co/microsoft/layoutlmv3 - base) |
Fine - tuned by |
Lakshya Singh (solo contributor) |
Languages |
English, Spanish, French, German, Italian |
License |
Apache - 2.0 (inherited from base model) |
Intended Use |
Extract answers to structured queries from scanned documents |
Funding |
Not funded (completed independently) |
Model Sources
- Repository: [
Github Link
](https://github.com/Lakshyasinghrawat12/DocumentQA - lakshya - rawat - document - qa - model)
- Trained on: Adapted version of
nielsr/docvqa_1200_examples
- Model metrics: See 
Uses
Direct Use
This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
- Information extraction tasks using OCR and layout - aware understanding
Out - of - Scope Use
- Not suitable for conversational QA
- Not suitable for images with no OCR - processed text
Training Details
Dataset
The dataset consisted of:
- Images of utility bills and documents
- OCR data with bounding boxes (from PaddleOCR)
- Queries in English, Spanish, and Chinese
- Answer spans with match scores and positions
Training Procedure
- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
- Model: LayoutLMv3 - base
- Epochs: 4
- Learning rate schedule: Shown in image below
Training Metrics
- F1 Score (validation): 
- Loss & Learning Rate Chart: 
Evaluation
Metrics Used
- F1 score
- Match score of predicted spans
- Token overlap vs ground truth
Summary
The model performs well on document - style QA tasks, especially with:
- Clearly structured OCR results
- Document types similar to utility bills, invoices, and forms
How to Use
- Available on my [
Github
](https://github.com/Lakshyasinghrawat12/DocumentQA - lakshya - rawat - document - qa - model)
đ§ Technical Details
No specific technical implementation details (more than 50 words) are provided in the original document, so this section is skipped.
đ License
The model is licensed under the Apache - 2.0 license, inherited from the base model.