đ trocr-base-handwritten_nj_biergarten_captcha_v2
A model for CAPTCHA OCR, fine-tuned from microsoft/trocr-base-handwritten
.
đ Quick Start
Use the code below to get started with the model:
import torch
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
hub_dir = "phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2"
processor = TrOCRProcessor.from_pretrained(hub_dir)
model = VisionEncoderDecoderModel.from_pretrained(hub_dir)
model = model.to(device)
from PIL import Image
image = Image.open("/path/to/image")
pixel_values = processor(image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(device)
outputs = model.generate(pixel_values)
pred_str = processor.batch_decode(outputs, skip_special_tokens=True)[0]
⨠Features
This is a simple model finetuned from microsoft/trocr-base-handwritten
on a dataset phunc20/nj_biergarten_captcha_v2
.
đ Documentation
Model Details
This is a model for CAPTCHA OCR, which is fine - tuned from microsoft/trocr-base-handwritten
on the dataset phunc20/nj_biergarten_captcha_v2
.
Uses
Direct Use
The provided Python code demonstrates how to use the model for CAPTCHA OCR.
Bias, Risks, and Limitations
Although the model seems to perform well on the dataset phunc20/nj_biergarten_captcha_v2
, it does not exhibit such good performance across all CAPTCHA images. In this respect, this model is worse than Human.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
Training Details
Training Data
The model was trained on the train
split of phunc20/nj_biergarten_captcha_v2
and evaluated on the validation
split, without using the test
split.
Training Procedure
Please refer to https://gitlab.com/phunc20/captchew/-/blob/main/colab_notebooks/train_from_pretrained_Seq2SeqTrainer_torchDataset.ipynb?ref_type=heads, which is adapted from https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TrOCR/Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_Seq2SeqTrainer.ipynb.
Evaluation
Testing Data, Factors & Metrics
Testing Data
- The
test
split of phunc20/nj_biergarten_captcha_v2
.
- The Kaggle dataset https://www.kaggle.com/datasets/fournierp/captcha-version-2-images/data (referred to as
kaggle_test_set
in this model card).
Factors
[More Information Needed]
Metrics
CER, exact match and average length difference. The former two can be found in HuggingFace's documentation. The last one is a custom metric, and its explanation can be found at https://gitlab.com/phunc20/captchew/-/blob/v0.1/average_length_difference.py.
Results
On the test
split of phunc20/nj_biergarten_captcha_v2
:
Model |
cer |
exact match |
avg len diff |
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 |
0.001333 |
496/500 |
1/500 |
microsoft/trocr-base-handwritten |
0.9 |
5/500 |
2.4 |
On kaggle_test_set
:
Model |
cer |
exact match |
avg len diff |
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 |
0.4381 |
69/1070 |
0.1289 |
microsoft/trocr-base-handwritten |
1.0112 |
17/1070 |
2.4439 |
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
đ License
This project is licensed under the GPL - 3.0 license.