H

Historical Newspaper Ner

Developed by dell-research-harvard
A named entity recognition model fine-tuned based on Roberta-large, specifically designed for historical newspaper texts that may contain OCR errors.
Downloads 209
Release Time : 9/14/2023

Model Overview

This model can identify four types of entities: Location (LOC), Organization (ORG), Person (PER), and Miscellaneous (MISC), suitable for historical news text analysis.

Model Features

High-precision Annotation
Training data was double-entered and manually verified by Harvard undergraduates, ensuring extremely high labeling quality.
OCR Error Tolerance
Optimized for texts that may contain OCR errors, suitable for low-quality texts such as historical newspapers.
Entity Type Differentiation
Capable of distinguishing between the beginning and continuation of entities, effectively handling consecutive occurrences of the same entity type.

Model Capabilities

Named Entity Recognition
Historical Text Analysis
OCR Error Text Processing

Use Cases

Historical Research
Historical Figure Identification
Identify important historical figures and related information from historical newspapers.
PER entity F1 score reached 94.3
Historical Location Analysis
Identify locations of historical events for geospatial analysis.
LOC entity F1 score reached 90.8
Archive Digitization
Newspaper Content Structuring
Convert OCR text from scanned newspapers into structured data for easy retrieval and analysis.
Overall strict match F1 score of 86.5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase