đ Typo Detector
A model designed to detect typos in text, leveraging a specific dataset and providing evaluation metrics.
đ Quick Start
đĻ Installation
First, install the necessary requirements:
pip install transformers
đģ Usage Examples
đ Basic Usage
import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
model_name_or_path = "m3hrdadfi/typo-detector-distilbert-en"
config = AutoConfig.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path, config=config)
nlp = pipeline('token-classification', model=model, tokenizer=tokenizer, aggregation_strategy="average")
đ Advanced Usage
sentences = [
"He had also stgruggled with addiction during his time in Congress .",
"The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .",
"Letterma also apologized two his staff for the satyation .",
"Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .",
"It is left to the directors to figure out hpw to bring the stry across to tye audience .",
]
for sentence in sentences:
typos = [sentence[r["start"]: r["end"]] for r in nlp(sentence)]
detected = sentence
for typo in typos:
detected = detected.replace(typo, f'<i>{typo}</i>')
print(" [Input]: ", sentence)
print("[Detected]: ", detected)
print("-" * 130)
Output:
[Input]: He had also stgruggled with addiction during his time in Congress .
[Detected]: He had also <i>stgruggled</i> with addiction during his time in Congress .
----------------------------------------------------------------------------------------------------------------------------------
[Input]: The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .
[Detected]: The review <i>thoroughla</i> assessed all aspects of JLENS SuR and CPG <i>esign</i> <i>maturit</i> and confidence .
----------------------------------------------------------------------------------------------------------------------------------
[Input]: Letterma also apologized two his staff for the satyation .
[Detected]: <i>Letterma</i> also apologized <i>two</i> his staff for the <i>satyation</i> .
----------------------------------------------------------------------------------------------------------------------------------
[Input]: Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .
[Detected]: Vincent Jay had earlier won France 's first gold in <i>gthe</i> 10km biathlon sprint .
----------------------------------------------------------------------------------------------------------------------------------
[Input]: It is left to the directors to figure out hpw to bring the stry across to tye audience .
[Detected]: It is left to the directors to figure out <i>hpw</i> to bring the <i>stry</i> across to <i>tye</i> audience .
----------------------------------------------------------------------------------------------------------------------------------
đ Documentation
đĻ Dataset Information
For this specific task, I used NeuSpell corpus as my raw data.
đ Evaluation
The following tables summarize the scores obtained by model overall and per each class.
Property |
Details |
Precision |
0.992332 |
Recall |
0.985997 |
F1-Score |
0.989154 |
Support |
416054.0 |
# |
precision |
recall |
f1-score |
support |
TYPO |
0.992332 |
0.985997 |
0.989154 |
416054.0 |
micro avg |
0.992332 |
0.985997 |
0.989154 |
416054.0 |
macro avg |
0.992332 |
0.985997 |
0.989154 |
416054.0 |
weighted avg |
0.992332 |
0.985997 |
0.989154 |
416054.0 |
â Questions?
Post a Github issue on the TypoDetector Issues repo.