kyc_v1-donut-demo Open Source Model - Free Analysis of Indian KYC Document Information, Supporting Multiple Types of Documents

Kyc V1 Donut Demo

Developed by sourinkarmakar

Donut is an end-to-end visual document understanding model specifically designed for parsing Indian KYC document information, supporting classification and content extraction for Aadhaar cards, PAN cards, and voter IDs.

Image-to-Text

Transformers

Supports Multiple Languages#End-to-end document understanding #Indian KYC document recognition #No OCR dependency

Downloads 40

Release Time : 7/3/2023

Model Overview

This model adopts a Transformer architecture and can directly extract structured information from document images without relying on OCR modules, supporting multi-type document recognition and orientation detection.

Model Features

End-to-end processing

No OCR preprocessing required, directly from image to structured output

Multi-document support

Can recognize three types of Indian KYC documents: Aadhaar card, PAN card, and voter ID

Orientation adaptation

Automatically detects document orientation, supports input in any direction

Color detection

Can identify whether the document image is in color or black-and-white

Model Capabilities

Document classification

Text information extraction

Image orientation detection

Color mode recognition

Use Cases

Financial compliance

KYC automated review

Automatically extracts customer document information for bank account verification

Accuracy: PAN card 94%, voter ID 76%

Identity verification

Document information digitization

Converts paper documents into structured electronic data

Supports JSON format output

🚀 Donut KYC Document Reading Model

Donut is an end - to - end VDU model for general document image understanding, specifically trained for Indian KYC documents.

🚀 Quick Start

To start using the model, you can follow the inference code example below.

✨ Features

End - to - end Design: Donut is a self - contained VDU model that doesn't rely on OCR - related modules.
Transformer - based Architecture: Composed of a visual encoder and a textual decoder, both based on Transformer architecture, enabling easy end - to - end training.
Multi - function for KYC: Can classify and read the contents of Aadhar, PAN, and Voter documents, detect orientation, and distinguish between colored and black - and - white documents.

📚 Documentation

Model description

Donut is an end - to - end (i.e., self - contained) VDU model for the general understanding of document images. The architecture of Donut is quite simple, which consists of a Transformer based visual encoder and textual decoder modules. Donut does not rely on any modules related to OCR functionality but uses a visual encoder for extracting features from a given document image. The following textual decoder maps the derived features into a sequence of subword tokens to construct a desired structured format (e.g., JSON). Each model component is Transformer - based, and thus the model is trained easily in an end - to - end manner.

![image.png](https://cdn - uploads.huggingface.co/production/uploads/637eccd46df7e8f7df76a3ae/OSQp25332524epV2PimZb.png)

Intended uses and limitations

This model is trained to be used for reading the contents of Indian KYC documents. It can classify and read the contents of Aadhar, PAN and Voter. It also can detect the orientation and whether the document is coloured or Black and White. The document for input can be oriented in any direction. The model should be provided with a fair - quality image (so that the contents are readable). It has been trained on limited data so the performance might not be very good. In future versions, the number of images will be more and more types of KYC documents can be added to this.

Training data

For v1, a custom dataset has been used for the training purpose where around 283 images were used, out of which 199 were for training, 42 were for validation and 42 were for testing. Out of 199 images, 57 Aadhar samples, 57 PAN samples and 85 Voter samples were used.

Performance

The current performance is as follows:

Overall accuracy = 74 %
Aadhar = 49 % (need to check out, the reason behind the less accuracy)
PAN = 94 %
Voter = 76 %

💻 Usage Examples

Basic Usage

from transformers import DonutProcessor, VisionEncoderDecoderModel

import re
import cv2
import json
import torch
from tqdm.auto import tqdm
import numpy as np

from donut import JSONParseEvaluator

processor = DonutProcessor.from_pretrained("sourinkarmakar/kyc_v1-donut-demo")
model = VisionEncoderDecoderModel.from_pretrained("sourinkarmakar/kyc_v1-donut-demo")

# Need to install python-donut
# !pip install -q donut-python

# Images stored inside a folder 'unseen_samples'
dataset = glob.glob(os.path.join(basepath, "unseen_samples/*"))

output_list = []

for idx, sample in tqdm(enumerate(dataset), total=len(dataset)):
    # prepare encoder inputs
    img = cv2.imread(sample)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    pixel_values = processor(img, return_tensors="pt").pixel_values
    pixel_values = pixel_values.to(device)

    # prepare decoder inputs
    task_prompt = "<s_cord-v2>"
    decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids
    decoder_input_ids = decoder_input_ids.to(device)

    # autoregressively generate sequence
    outputs = model.generate(
        pixel_values,
        decoder_input_ids=decoder_input_ids,
        max_length=model.decoder.config.max_position_embeddings,
        early_stopping=True,
        pad_token_id=processor.tokenizer.pad_token_id,
        eos_token_id=processor.tokenizer.eos_token_id,
        use_cache=True,
        num_beams=1,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        return_dict_in_generate=True,
    )

    # turn into JSON
    seq = processor.batch_decode(outputs.sequences)[0]
    seq = seq.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
    seq = re.sub(r"<.*?>", "", seq, count=1).strip() # remove first task start token
    seq = processor.token2json(seq)

    output_list.append(seq)

print(output_list)

📄 License

No license information provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご