đ Model Card for Model qwen-for-jawi-v1
This model card provides detailed information about the qwen-for-jawi-v1
model, which is specialized for Optical Character Recognition (OCR) of historical Malay texts written in Jawi script.
⨠Features
- Specialized for OCR of historical Malay manuscripts in Jawi script.
- Based on the Qwen2-VL-2B-Instruct model, a vision - language model.
- Enables digital preservation of Malay cultural heritage and computational analysis of historical texts.
đĻ Installation
The README doesn't provide specific installation steps, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch
from qwen_vl_utils import process_vision_info
from PIL import Image
model_name = 'mevsg/qwen-for-jawi-v1'
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map='auto'
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
image_path = 'path/to/image'
image = Image.open(image_path).convert('RGB')
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image,
},
{"type": "text", "text": "Convert this image to text"},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
đ Documentation
Model Description
This model is a fine - tuned version of Qwen/Qwen2-VL-7B-Instruct specialized for Optical Character Recognition (OCR) of historical Malay texts written in Jawi script (Arabic script adapted for Malay language).
Property |
Details |
Model Type |
Vision-Language Model |
Base Model |
Qwen2-VL-2B-Instruct |
Parameters |
2 billion |
Language(s) |
Malay (Jawi script) |
Intended Use
Primary Intended Uses
- OCR for historical Malay manuscripts written in Jawi script.
- Digital preservation of Malay cultural heritage.
- Enabling computational analysis of historical Malay texts.
Out-of-Scope Uses
- General Arabic text recognition.
- Modern Malay text processing.
- Real-time OCR applications.
Training Data
Dataset Description
This was trained and evaluated using
Training Procedure
- Hardware used: 1 x H100
- Training time: 6 hours
Performance and Limitations
Performance Metrics
- Character Error Rate (CER): 8.66
- Word Error Rate (WER): 25.50
Comparison with Other Models
We compared this model with https://github.com/VikParuchuri/surya, which reports high accuracy rates for Arabic, but performs poorly on our Jawi data:
- Character Error Rate (CER): 70.89%
- Word Error Rate (WER): 91.73%
đ§ Technical Details
The README doesn't provide in - depth technical details, so this section is skipped.
đ License
The README doesn't provide license information, so this section is skipped.
đ Citation
@misc{qwen-for-jawi-v1,
title = {Qwen for Jawi v1: a model for Jawi OCR},
author = {[Miguel Escobar Varela]},
year = {2024},
publisher = {HuggingFace},
url = {[https://huggingface.co/mevsg/qwen-for-Jawi-v1]},
note = {Model created at National University of Singapore }
}
Acknowledgements
Special thanks to William Mattingly, whose finetuning script served as the base for our finetuning approach: https://github.com/wjbmattingly/qwen2-vl-finetune-huggingface