typo-detector-distilbert-enオープンソースのスペルミス検出モデル

ホーム

Typo Detector Distilbert En

m3hrdadfiによって開発

DistilBERTアーキテクチャに基づくスペルミス検出モデル、テキスト内のスペルミスを識別するために使用

シーケンスラベリング

Transformers

英語#スペルミス検出 #高精度F1 #英語テキスト処理

ダウンロード数 25.05k

リリース時間 : 3/2/2022

モデル概要

このモデルはDistilBERTベースの固有表現認識(NER)モデルで、テキスト内のスペルミスを検出するために特別に設計されています。NeuSpellコーパスを使用して訓練されており、テキスト内のスペル問題を効率的かつ正確に識別できます。

モデル特徴

高精度

スペルミス検出タスクで0.989のF1スコアを達成

DistilBERTベース

軽量なDistilBERTアーキテクチャを使用し、性能を維持しながら計算リソース要件を削減

簡単使用

Transformersパイプラインを通じてアプリケーションに簡単に統合可能

モデル能力

テキストスペルミス検出

固有表現認識

使用事例

テキスト編集・校正

文書校正

文書内のスペルミスを自動検出

文書品質と専門性の向上

コンテンツ審査

ユーザー生成コンテンツ内のスペル問題を識別

プラットフォームコンテンツ品質の向上

教育

言語学習支援

言語学習者のライティングにおけるスペルミス識別を支援

学習効率の向上

🚀 タイポ検出器

このモデルは、文章中のタイポを検出することができ、Transformerベースのアプローチを用いて高精度なタイポ検出を実現しています。

🚀 クイックスタート

このセクションでは、このタイポ検出器の基本的な使い方を説明します。

✨ 主な機能

NeuSpellコーパスを用いた学習により、高精度なタイポ検出が可能です。
Transformersパイプラインを使用して簡単に予測を行うことができます。

📦 インストール

必要なライブラリをインストールするには、以下のコマンドを実行します。

pip install transformers

💻 使用例

基本的な使用法

import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline


model_name_or_path = "m3hrdadfi/typo-detector-distilbert-en"
config = AutoConfig.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path, config=config)
nlp = pipeline('token-classification', model=model, tokenizer=tokenizer, aggregation_strategy="average")

高度な使用法

以下のコードは、複数の文章に対してタイポを検出する例です。

sentences = [
 "He had also stgruggled with addiction during his time in Congress .",
 "The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .",
 "Letterma also apologized two his staff for the satyation .",
 "Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .",
 "It is left to the directors to figure out hpw to bring the stry across to tye audience .",
]

for sentence in sentences:
    typos = [sentence[r["start"]: r["end"]] for r in nlp(sentence)]

    detected = sentence
    for typo in typos:
        detected = detected.replace(typo, f'<i>{typo}</i>')

    print("   [Input]: ", sentence)
    print("[Detected]: ", detected)
    print("-" * 130)

出力結果:

   [Input]:  He had also stgruggled with addiction during his time in Congress .
[Detected]:  He had also <i>stgruggled</i> with addiction during his time in Congress .
----------------------------------------------------------------------------------------------------------------------------------
   [Input]:  The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .
[Detected]:  The review <i>thoroughla</i> assessed all aspects of JLENS SuR and CPG <i>esign</i> <i>maturit</i> and confidence .
----------------------------------------------------------------------------------------------------------------------------------
   [Input]:  Letterma also apologized two his staff for the satyation .
[Detected]:  <i>Letterma</i> also apologized <i>two</i> his staff for the <i>satyation</i> .
----------------------------------------------------------------------------------------------------------------------------------
   [Input]:  Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .
[Detected]:  Vincent Jay had earlier won France 's first gold in <i>gthe</i> 10km biathlon sprint .
----------------------------------------------------------------------------------------------------------------------------------
   [Input]:  It is left to the directors to figure out hpw to bring the stry across to tye audience .
[Detected]:  It is left to the directors to figure out <i>hpw</i> to bring the <i>stry</i> across to <i>tye</i> audience .
----------------------------------------------------------------------------------------------------------------------------------

📚 ドキュメント

データセット情報

この特定のタスクでは、NeuSpellコーパスを生データとして使用しました。

評価

以下の表は、モデルの全体的なスコアと各クラスごとのスコアをまとめたものです。

属性	詳情
モデルタイプ	タイポ検出器
訓練データ	NeuSpellコーパス

#	適合率	再現率	F1スコア	サポート
タイポ	0.992332	0.985997	0.989154	416054.0
マイクロ平均	0.992332	0.985997	0.989154	416054.0
マクロ平均	0.992332	0.985997	0.989154	416054.0
加重平均	0.992332	0.985997	0.989154	416054.0