tablecell-htr開源模型 - 免費識別表格單元格圖像手寫文本，適用芬蘭舊記錄

首頁

Tablecell Htr

由Kansallisarkisto開發

該模型用於從表格單元格的文本行圖像中識別手寫文本，特別適用於20世紀30年代芬蘭死亡記錄和人口普查記錄中的手寫文本識別。

文字識別

Safetensors

開源協議:MIT #表格手寫識別 #歷史文檔處理 #芬蘭語OCR

下載量 39

發布時間 : 9/12/2024

模型概述

通過對芬蘭國家檔案館的多世紀手寫文本識別模型和微軟的TrOCR模型進行微調訓練而成，專門用於識別特定類型表格單元格中的手寫文本。

模型特點

特定領域優化

專門針對歷史文檔表格單元格中的手寫文本進行優化，識別精度高

基於TrOCR架構

利用微軟TrOCR模型的強大能力，結合領域特定數據進行微調

GPU加速支持

支持GPU加速推理，提高處理速度

模型能力

手寫文本識別

表格單元格內容提取

歷史文檔數字化

使用案例

檔案數字化

歷史人口記錄轉錄

將20世紀30年代芬蘭人口普查記錄中的手寫內容轉換為可搜索的文本

驗證CER為0.107，WER為0.237

死亡記錄數字化

自動識別和轉錄歷史死亡記錄表格中的手寫信息

🚀 表格單元格圖像手寫文本識別模型

本模型可實現從文本行圖像中進行手寫文本識別。它通過微調國家檔案館的多世紀手寫文本識別模型和微軟的TrOCR模型，使用20世紀30年代芬蘭死亡記錄和人口普查記錄表格中的文本行圖像進行訓練，為表格單元格圖像的手寫文本識別提供了有效的解決方案。

🚀 快速開始

本模型可按以下代碼預測圖像的文本內容。若有可用的GPU，建議在推理時使用。

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Model location in Huggingface Hub
model_checkpoint = "Kansallisarkisto/tablecell-htr"
# Path to textline image
line_image_path = "/path/to/textline_image.jpg"

# Initialize processor and model
processor = TrOCRProcessor.from_pretrained(model_checkpoint)
model = VisionEncoderDecoderModel.from_pretrained(model_checkpoint).to(device)

# Open image file and extract pixel values
image = Image.open(line_image_path).convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Use the model to generate predictions 
generated_ids = model.generate(pixel_values.to(device))
# Use the processor to decode ids to text
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

從HuggingFace Hub下載的模型會本地保存到 ~/.cache/huggingface/hub/。

✨ 主要特性

該模型經過特定類型表格單元格數據的訓練，能有效識別其中的手寫文本。
基於微調的國家檔案館多世紀手寫文本識別模型和微軟TrOCR模型，具有較好的識別能力。

💻 使用示例

基礎用法

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Model location in Huggingface Hub
model_checkpoint = "Kansallisarkisto/tablecell-htr"
# Path to textline image
line_image_path = "/path/to/textline_image.jpg"

# Initialize processor and model
processor = TrOCRProcessor.from_pretrained(model_checkpoint)
model = VisionEncoderDecoderModel.from_pretrained(model_checkpoint).to(device)

# Open image file and extract pixel values
image = Image.open(line_image_path).convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Use the model to generate predictions 
generated_ids = model.generate(pixel_values.to(device))
# Use the processor to decode ids to text
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)