đ Handwritten text recognition for Finnish 19th century court records
This model performs handwritten text recognition from text line images. It was fine - tuned from Microsoft's TrOCR model using digitized 19th - century court record documents in Finnish and Swedish.
đ Quick Start
The model can be used for predicting the text content of images. It is recommended to use GPU for inference if available.
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_checkpoint = "Kansallisarkisto/court-records-htr"
line_image_path = "/path/to/textline_image.jpg"
processor = TrOCRProcessor.from_pretrained(model_checkpoint)
model = VisionEncoderDecoderModel.from_pretrained(model_checkpoint).to(device)
image = Image.open(line_image_path).convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values.to(device))
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
The model downloaded from the HuggingFace Hub is saved locally to ~/.cache/huggingface/hub/
.
⨠Features
- Performs handwritten text recognition from text line images.
- Fine - tuned with 19th - century court record documents in Finnish and Swedish.
đĻ Installation
No specific installation steps are provided in the original README.
đģ Usage Examples
Basic Usage
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_checkpoint = "Kansallisarkisto/court-records-htr"
line_image_path = "/path/to/textline_image.jpg"
processor = TrOCRProcessor.from_pretrained(model_checkpoint)
model = VisionEncoderDecoderModel.from_pretrained(model_checkpoint).to(device)
image = Image.open(line_image_path).convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values.to(device))
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
đ Documentation
Intended uses & limitations
The model has been trained to recognize handwritten text from a specific type of 19th - century data, and may generalize poorly to other datasets. The model takes as input text line images, and the use of other types of inputs are not recommended.
Training data
Model was trained using 314 228 text line images from 19th - century court records, while the validation dataset contained 39 042 text line images.
Training procedure
This model was trained using a NVIDIA RTX A6000 GPU with the following hyperparameters:
- train batch size: 24
- epochs: 13
- optimizer: AdamW
- maximum length of text sequence: 64
For other parameters, the default values were used (find more information here). The training code is available in the train_trocr.py
code file.
Evaluation results
Evaluation results using the validation dataset are listed below:
Validation loss |
Validation CER |
Validation WER |
0.248 |
0.024 |
0.113 |
The metrics were calculated using the Evaluate library.
More information on the CER metric can be found here.
More information on the WER metric can be found here.
đ§ Technical Details
The model is based on fine - tuning Microsoft's TrOCR model. It uses a NVIDIA RTX A6000 GPU for training with specific hyperparameters as mentioned above.
đ License
The model is licensed under the MIT license.
Property |
Details |
Base Model |
microsoft/trocr-base-handwritten |
Pipeline Tag |
image-to-text |
Metrics |
cer, wer |
License |
mit |