🚀 LoRA微调的XSum T5文本摘要器
本模型是一个专为文本摘要任务优化的工具,基于t5-small
模型在XSum数据集上进行LoRA(低秩自适应)微调,能高效生成高质量的文本摘要。
🚀 快速开始
本模型是 t5-small 在xsum数据集上的微调版本。
✨ 主要特性
这是一个针对文本摘要进行优化的T5-small的LoRA(低秩自适应)微调版本。该模型在XSum数据集上进行了抽象摘要的训练。
💻 使用示例
基础用法
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
import torch
base_model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
my_model = PeftModel.from_pretrained(base_model, "Lakshan2003/finetuned-t5-xsum")
def test_peft_summarizer(text, model, max_length=128, min_length=30):
"""
Test the PEFT-loaded summarization model
Args:
text (str): Input text to summarize
model: The loaded PEFT model
max_length (int): Maximum length of the summary
min_length (int): Minimum length of the summary
"""
tokenizer = AutoTokenizer.from_pretrained("Lakshan2003/finetuned-t5-xsum")
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
prefix = "summarize: "
input_text = prefix + text
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
output_ids = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_length=max_length,
min_length=min_length,
num_beams=4,
length_penalty=2.0,
early_stopping=True,
no_repeat_ngram_size=3
)
summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)
return summary
test_text = """
The United Nations has warned that climate change poses an unprecedented threat to human civilization. In a landmark report, scientists detailed how rising temperatures are affecting everything from weather patterns to food production. The report emphasizes that without immediate and substantial action to reduce greenhouse gas emissions, the world faces severe consequences including rising sea levels, more frequent extreme weather events, and widespread ecosystem collapse. Many countries have pledged to reduce their carbon emissions, but experts say current commitments fall short of what's needed to prevent the worst impacts of climate change. The report also highlights the disproportionate effect of climate change on developing nations, which often lack the resources to adapt to changing conditions.
"""
summary = test_peft_summarizer(test_text, my_model)
print("Original Text:")
print(test_text)
print("\nGenerated Summary:")
print(summary)
🔧 技术细节
训练超参数
训练过程中使用了以下超参数:
- 学习率:2e-05
- 训练批次大小:4
- 评估批次大小:4
- 随机种子:42
- 梯度累积步数:2
- 总训练批次大小:8
- 优化器:使用adamw_torch,β值为(0.9, 0.999),ε值为1e-08,无额外优化器参数
- 学习率调度器类型:线性
- 训练轮数:1
- 混合精度训练:原生自动混合精度(Native AMP)
框架版本
- PEFT 0.14.0
- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
📄 许可证
本项目采用Apache-2.0许可证。