t5-finetuned-test開源文本摘要模型 - 免費部署精準提煉維基指南內容要點

首頁

T5 Finetuned Test

由osanseviero開發

基於維基指南數據集訓練的T5-small架構文本摘要模型

文本生成英語#維基指南摘要生成 #T5小型架構 #序列到序列模型

下載量 24

發布時間 : 3/2/2022

模型概述

該模型專門用於生成維基指南類文章的摘要，採用序列到序列架構，適用於英文文本的自動化摘要任務。

模型特點

高效摘要生成

針對維基指南類內容優化的摘要生成能力

輕量級架構

採用T5-small架構，在保持性能的同時降低計算資源需求

專業領域適配

專門在維基指南數據集上微調，適合操作指南類文本

模型能力

文本摘要生成

英文文本處理

序列到序列轉換

使用案例

內容摘要

維基指南文章摘要

自動生成維基指南文章的簡明摘要

Rouge1分數31.2，RougeL分數24.5

內容簡化

操作指南簡化

將複雜的操作步驟簡化為關鍵要點

🚀 維基指南T5小模型

這是一個基於T5小模型架構，在維基指南（Wikihow）全數據集上訓練的模型。它可以有效對文本進行摘要處理，幫助用戶快速獲取關鍵信息。

✨ 主要特性

標籤信息：該模型與多個標籤相關，包括wikihow、t5 - small、pytorch、lm - head、seq2seq、t5、pipeline:summarization、summarization等，表明其在文本摘要領域的專業性和適用性。
數據集：使用Wikihow數據集進行訓練，該數據集涵蓋豐富的知識和多樣的文本類型，使得模型具有廣泛的適用性。
評估指標：模型在Rouge1指標上得分為31.2，RougeL指標上得分為24.5，顯示出較好的摘要生成效果。

📦 安裝指南

暫未提供相關安裝步驟，若有需要可根據模型使用的框架（如transformers庫）進行常規安裝。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("deep-learning-analytics/wikihow-t5-small")
model = AutoModelWithLMHead.from_pretrained("deep-learning-analytics/wikihow-t5-small")

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

text = """
Lack of fluids can lead to dry mouth, which is a leading cause of bad breath. Water
can also dilute any chemicals in your mouth or gut that are causing bad breath., Studies show that
eating 6 ounces of yogurt a day reduces the level of odor-causing compounds in the mouth. In
particular, look for yogurt containing the active bacteria Streptococcus thermophilus or
Lactobacillus bulgaricus., The abrasive nature of fibrous fruits and vegetables helps to clean
teeth, while the vitamins, antioxidants, and acids they contain improve dental health.Foods that can
be particularly helpful include:Apples — Apples contain vitamin C, which is necessary for health
gums, as well as malic acid, which helps to whiten teeth.Carrots — Carrots are rich in vitamin A,
which strengthens tooth enamel.Celery — Chewing celery produces a lot of saliva, which helps to
neutralize bacteria that cause bad breath.Pineapples — Pineapples contain bromelain, an enzyme that
cleans the mouth., These teas have been shown to kill the bacteria that cause bad breath and
plaque., An upset stomach can lead to burping, which contributes to bad breath. Don’t eat foods that
upset your stomach, or if you do, use antacids. If you are lactose intolerant, try lactase tablets.,
They can all cause bad breath. If you do eat them, bring sugar-free gum or a toothbrush and
toothpaste to freshen your mouth afterwards., Diets low in carbohydrates lead to ketosis — a state
in which the body burns primarily fat instead of carbohydrates for energy. This may be good for your
waistline, but it also produces chemicals called ketones, which contribute to bad breath.To stop the
problem, you must change your diet. Or, you can combat the smell in one of these ways:Drink lots of
water to dilute the ketones.Chew sugarless gum or suck on sugarless mints.Chew mint leaves.
"""

preprocess_text = text.strip().replace("\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\", "")
tokenized_text = tokenizer.encode(preprocess_text, return_tensors="pt").to(device)

summary_ids = model.generate(
    tokenized_text,
    max_length=150, 
    num_beams=2,
    repetition_penalty=2.5, 
    length_penalty=1.0, 
    early_stopping=True
)

output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print("Summarized text: ", output)

高級用法

# 這裡可根據具體需求調整生成摘要的參數，如改變max_length、num_beams等，以滿足不同的摘要長度和質量要求。
# 例如，增加max_length可以得到更長的摘要，調整num_beams可以優化搜索策略。
# 以下是一個調整參數的示例：

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("deep-learning-analytics/wikihow-t5-small")
model = AutoModelWithLMHead.from_pretrained("deep-learning-analytics/wikihow-t5-small")

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

text = """
Bring 1/2 cup water to the boil.Add the fresh or dried rosemary to the water.Remove
from the heat. Set aside for 1/2 an hour to infuse. Added flavour can be released by pressing down
on the rosemary leaves with a spoon. Add the pieces to the blender or food processor with the
elderflower cordial. Blend or process to a purée.,, Add the lemon or lime juice and stir to
combine., Add a cover and place in the freezer.After 2 hours, remove from the freezer and break up
with a fork. This helps the ice crystals to form properly.Continue doing this every hour until the
granita freezes properly. Scoop the granita into dessert bowls and serve. Garnish with a cucumber
curl or a small sprig of rosemary.
"""

preprocess_text = text.strip().replace("\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\", "")
tokenized_text = tokenizer.encode(preprocess_text, return_tensors="pt").to(device)

# 調整參數
summary_ids = model.generate(
    tokenized_text,
    max_length=200,  # 增加最大長度
    num_beams=4,     # 增加束搜索的束數
    repetition_penalty=3.0, 
    length_penalty=1.2, 
    early_stopping=True
)

output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print("Summarized text: ", output)

📚 詳細文檔

該模型是在Wikihow全數據集上進行訓練的T5小模型。訓練過程持續了3個輪次，使用的批量大小為16，學習率為3e - 4。最大輸入長度設置為512，最大輸出長度為150。你可以參考此博客文章瞭解詳細的訓練過程。

🔧 技術細節

該模型基於T5小模型架構，在Wikihow數據集上進行微調。訓練過程中，使用了特定的批量大小和學習率，以優化模型的性能。通過設置最大輸入和輸出長度，確保模型能夠處理合適長度的文本並生成合理的摘要。在評估方面，採用了Rouge1和RougeL指標來衡量模型的摘要質量，最終模型在Rouge1指標上達到31.2分，RougeL指標上達到24.5分。