T5-large-spell英語拼寫校正模型 - 免費開源自動糾正文本拼寫和打字錯誤

首頁

T5 Large Spell

由ai-forever開發

基於T5-large訓練的英語拼寫校正模型，可自動糾正文本中的拼寫錯誤和打字錯誤

大型語言模型

Transformers

英語開源協議:MIT #英語拼寫校正 #T5大模型優化 #多類型錯誤修正

下載量 2,241

發布時間 : 7/29/2023

模型概述

該模型通過將文本中的所有單詞轉換為標準英語形式來糾正拼寫錯誤和打字錯誤，基於T5-large模型訓練，使用包含人工製造錯誤的擴展數據集

模型特點

高精度拼寫校正

在BEA60K和JFLEG數據集上表現優異，F1值超過多個對比模型

基於T5-large架構

利用強大的T5-large模型進行訓練，具備優秀的自然語言處理能力

合成錯誤訓練數據

使用SAGE庫自動注入錯誤的擴展數據集訓練，覆蓋多種錯誤類型

模型能力

拼寫錯誤檢測

打字錯誤糾正

文本標準化

自然語言生成

使用案例

文本處理

文檔校對

自動檢測和糾正文檔中的拼寫錯誤

提高文檔質量和專業性

內容創作輔助

幫助作者糾正寫作中的拼寫錯誤

提升寫作效率和準確性

教育

語言學習輔助

幫助英語學習者識別和糾正拼寫錯誤

提高學習效率和準確性

🚀 T5-large-spell模型

該模型能夠將文本中的所有單詞轉換為標準英語，從而糾正拼寫錯誤和打字錯誤。此校對器基於 T5-large 模型進行訓練。訓練語料採用了包含 “人工” 錯誤的大型數據集：該語料庫基於英文維基百科和新聞博客構建，然後使用 SAGE庫的功能自動引入拼寫錯誤和打字錯誤。

✨ 主要特性

能夠有效糾正英文文本中的拼寫錯誤和打字錯誤。
基於強大的 T5-large 模型進行訓練。
使用包含 “人工” 錯誤的大型數據集進行訓練，提升糾錯能力。

📦 安裝指南

文檔未提及具體安裝步驟，暫不提供。

💻 使用示例

基礎用法

from transformers import T5ForConditionalGeneration, AutoTokenizer

path_to_model = "ai-forever/T5-large-spell"

model = T5ForConditionalGeneration.from_pretrained(path_to_model)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)
prefix = "grammar: "

sentence = "If you bought something goregous, you well be very happy."
sentence = prefix + sentence

encodings = tokenizer(sentence, return_tensors="pt")
generated_tokens = model.generate(**encodings)
answer = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(answer)

# ["If you bought something gorgeous, you will be very happy."]

📚 詳細文檔

公共引用

示例

輸入	輸出
Th festeivаl was excelzecnt in many ways, and in particular it beinganinternational festjival sss a chаllenging, bet brilli an t ea.	The festival was excellent in many ways, and in particular it beinganinternational festival is a challenging, but brilliant one to see.
That 's why I believe in the solution which is the closest to human nature and can help us to avoid boredome. I am sure that eventually we will take off our clothes and in the future we will be undressed and free. There wo n't be any problem with being up - do - date .	That's why I believe in the solution which is the closest to human nature and can help us to avoid boredom. I am sure that eventually we will take off our clothes and in the future we will be undressed and free. There won't be any problem with being up - do - date.
If you bought something goregous, you well be very happy.	If you bought something gorgeous, you will be very happy.

指標

質量

以下是用於確定拼寫檢查器正確性的自動指標。我們在兩個可用數據集上，將我們的解決方案與開源自動拼寫檢查器以及ChatGPT系列模型進行了比較：

BEA60K：從多個領域收集的英文拼寫錯誤；
JFLEG：1601個英文句子，其中包含約2000個拼寫錯誤；

BEA60K

模型	精確率	召回率	F1值
T5-large-spell	66.5	83.1	73.9
ChatGPT gpt - 3.5 - turbo - 0301	66.9	84.1	74.5
ChatGPT gpt - 4 - 0314	68.6	85.2	76.0
ChatGPT text - davinci - 003	67.8	83.9	75.0
Bert (https://github.com/neuspell/neuspell)	65.8	79.6	72.0
SC - LSTM (https://github.com/neuspell/neuspell)	62.2	80.3	72.0

JFLEG

模型	精確率	召回率	F1值
T5-large-spell	83.4	84.3	83.8
ChatGPT gpt - 3.5 - turbo - 0301	77.8	88.6	82.9
ChatGPT gpt - 4 - 0314	77.9	88.3	82.8
ChatGPT text - davinci - 003	76.8	88.5	82.2
Bert (https://github.com/neuspell/neuspell)	78.5	85.4	81.8
SC - LSTM (https://github.com/neuspell/neuspell)	80.6	86.1	83.2