🚀 T5-small 用于释义生成
本项目是基于 Google 的 T5-small 模型,在 TaPaCo 数据集上进行微调,以实现释义生成功能。该模型能够将输入的句子转换为多种不同表述但意思相近的句子,为文本处理提供了更多的灵活性。
🚀 快速开始
模型使用示例
以下是使用该模型进行释义生成的代码示例:
from transformers import T5ForConditionalGeneration, T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("hetpandya/t5-small-tapaco")
model = T5ForConditionalGeneration.from_pretrained("hetpandya/t5-small-tapaco")
def get_paraphrases(sentence, prefix="paraphrase: ", n_predictions=5, top_k=120, max_length=256,device="cpu"):
text = prefix + sentence + " </s>"
encoding = tokenizer.encode_plus(
text, pad_to_max_length=True, return_tensors="pt"
)
input_ids, attention_masks = encoding["input_ids"].to(device), encoding[
"attention_mask"
].to(device)
model_output = model.generate(
input_ids=input_ids,
attention_mask=attention_masks,
do_sample=True,
max_length=max_length,
top_k=top_k,
top_p=0.98,
early_stopping=True,
num_return_sequences=n_predictions,
)
outputs = []
for output in model_output:
generated_sent = tokenizer.decode(
output, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
if (
generated_sent.lower() != sentence.lower()
and generated_sent not in outputs
):
outputs.append(generated_sent)
return outputs
paraphrases = get_paraphrases("The house will be cleaned by me every Saturday.")
for sent in paraphrases:
print(sent)
输出示例
运行上述代码后,可能会得到以下输出:
The house is cleaned every Saturday by me.
The house will be cleaned on Saturday.
I will clean the house every Saturday.
I get the house cleaned every Saturday.
I will clean this house every Saturday.
📚 详细文档
模型微调
若你想了解如何对该模型进行微调,请参考以下指南:
https://towardsdatascience.com/training-t5-for-paraphrase-generation-ab3b5be151a2
作者信息
本项目由 Het Pandya/@hetpandya 创建,你还可以通过 LinkedIn 联系作者。
本项目在印度用心打造 ❤️