T5-large-spell英語スペル校正モデル - オープンソースで無料で、テキストのスペルとタイピングミスを自動的に修正

ホーム

T5 Large Spell

ai-foreverによって開発

T5-largeで訓練された英語スペル修正モデル、テキスト内のスペルミスやタイプミスを自動修正

大規模言語モデル

Transformers

英語オープンソースライセンス:MIT #英語スペル修正 #T5大規模モデル最適化 #多種類エラー修正

ダウンロード数 2,241

リリース時間 : 7/29/2023

モデル概要

このモデルはテキスト内の全ての単語を標準英語形式に変換することでスペルミスやタイプミスを修正、T5-largeモデルを基に訓練され、人工的に作成されたエラーを含む拡張データセットを使用

モデル特徴

高精度スペル修正

BEA60KとJFLEGデータセットで優れた性能、F1値が複数の比較モデルを上回る

T5-largeアーキテクチャ採用

強力なT5-largeモデルを活用した訓練、優れた自然言語処理能力を有する

合成エラートレーニングデータ

SAGEライブラリで自動的にエラーを注入した拡張データセットを使用、多様なエラータイプをカバー

モデル能力

スペルミス検出

タイプミス修正

テキスト標準化

自然言語生成

使用事例

テキスト処理

ドキュメント校正

ドキュメント内のスペルミスを自動検出・修正

ドキュメント品質と専門性の向上

コンテンツ作成支援

執筆中のスペルミスを修正する支援

執筆効率と正確性の向上

教育

言語学習支援

英語学習者のスペルミス識別と修正を支援

学習効率と正確性の向上

🚀 T5-large-spellモデル

このモデルは、テキスト内のすべての単語を標準的な英語に変換することで、スペルミスやタイプミスを修正します。校正機能は、T5-largeモデルをベースにトレーニングされています。トレーニングコーパスとして、「人工的な」エラーを含む大規模なデータセットが使用されています。このコーパスは、英語版ウィキペディアやニュースブログを基に構築され、その後、SAGEライブラリの機能を使って自動的にタイプミスやスペルミスが導入されました。

🚀 クイックスタート

このモデルを使用することで、英語の文章内のスペルミスやタイプミスを簡単に修正できます。以下のセクションでは、モデルの詳細、使用例、評価指標などを紹介します。

✨ 主な機能

テキスト内のすべての単語を標準的な英語に変換し、スペルミスやタイプミスを修正します。
T5-largeモデルをベースにトレーニングされています。
大規模な「人工的な」エラーを含むデータセットを使用してトレーニングされています。

📚 ドキュメント

公開されている参考文献

使用例

入力	出力
Th festeivаl was excelzecnt in many ways, and in particular it beinganinternational festjival sss a chаllenging, bet brilli an t ea.	The festival was excellent in many ways, and in particular it beinganinternational festival is a challenging, but brilliant one to see.
That 's why I believe in the solution which is the closest to human nature and can help us to avoid boredome. I am sure that eventually we will take off our clothes and in the future we will be undressed and free. There wo n't be any problem with being up - do - date .	That's why I believe in the solution which is the closest to human nature and can help us to avoid boredom. I am sure that eventually we will take off our clothes and in the future we will be undressed and free. There won't be any problem with being up - do - date.
If you bought something goregous, you well be very happy.	If you bought something gorgeous, you will be very happy.

🔧 技術詳細

評価指標

品質

以下は、スペルチェッカーの正確性を判断するための自動評価指標です。我々の解決策を、公開されている自動スペルチェッカーとChatGPTファミリーのモデルと比較し、2つの利用可能なデータセットで評価しています。

BEA60K：複数のドメインから収集された英語のスペルミス。
JFLEG：約2000のスペルミスを含む1601文の英語データ。

BEA60K

モデル	適合率	再現率	F1値
T5-large-spell	66.5	83.1	73.9
ChatGPT gpt-3.5-turbo-0301	66.9	84.1	74.5
ChatGPT gpt-4-0314	68.6	85.2	76.0
ChatGPT text-davinci-003	67.8	83.9	75.0
Bert (https://github.com/neuspell/neuspell)	65.8	79.6	72.0
SC-LSTM (https://github.com/neuspell/neuspell)	62.2	80.3	72.0

JFLEG

モデル	適合率	再現率	F1値
T5-large-spell	83.4	84.3	83.8
ChatGPT gpt-3.5-turbo-0301	77.8	88.6	82.9
ChatGPT gpt-4-0314	77.9	88.3	82.8
ChatGPT text-davinci-003	76.8	88.5	82.2
Bert (https://github.com/neuspell/neuspell)	78.5	85.4	81.8
SC-LSTM (https://github.com/neuspell/neuspell)	80.6	86.1	83.2

使い方

from transformers import T5ForConditionalGeneration, AutoTokenizer

path_to_model = "ai-forever/T5-large-spell"

model = T5ForConditionalGeneration.from_pretrained(path_to_model)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)
prefix = "grammar: "

sentence = "If you bought something goregous, you well be very happy."
sentence = prefix + sentence

encodings = tokenizer(sentence, return_tensors="pt")
generated_tokens = model.generate(**encodings)
answer = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(answer)

# ["If you bought something gorgeous, you will be very happy."]