🚀 文本改写释义器
本仓库包含一个基于T5-Base的微调文本改写模型,该模型拥有2.23亿个参数。此模型能够有效改写文本,为用户提供高质量的释义内容。
✨ 主要特性
- 基于T5-Base微调:借助预训练的文本到文本转换模型的强大能力,实现高效的释义功能。
- 大型数据集(43万个示例):在综合数据集上进行训练,该数据集整合了三个开源数据源,并采用多种技术进行清理,以确保最佳性能。
- 高质量释义:生成的释义能够显著改变句子结构,同时保持准确性和事实正确性。
- 不易被AI检测:旨在生成自然的释义,使其与人类撰写的文本难以区分。
模型性能:
📦 安装指南
文档未提及具体安装步骤,暂不展示。
💻 使用示例
基础用法
T5模型需要一个与任务相关的前缀,因为这是一个释义任务,我们将添加前缀 "paraphraser: "。
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser")
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser").to(device)
def generate_title(text):
input_ids = tokenizer(f'paraphraser: {text}', return_tensors="pt", padding="longest", truncation=True, max_length=64).input_ids.to(device)
outputs = model.generate(
input_ids,
num_beams=4,
num_beam_groups=4,
num_return_sequences=4,
repetition_penalty=10.0,
diversity_penalty=3.0,
no_repeat_ngram_size=2,
temperature=0.8,
max_length=64
)
return tokenizer.batch_decode(outputs, skip_special_tokens=True)
text = 'By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs.'
generate_title(text)
输出示例
['The fine-tuning can reduce the amount of expensive computing power and labeled data required to obtain large models adapted for niche use cases and business needs by using prior model training through transfer learning.',
'fine-tuning, by utilizing prior model training through transfer learning, can reduce the amount of expensive computing power and labeled data required to obtain large models tailored for niche use cases and business needs.',
'Fine-tunering by using prior model training through transfer learning can reduce the amount of expensive computing power and labeled data required to obtain large models adapted for niche use cases and business needs.',
'Using transfer learning to use prior model training, fine-tuning can reduce the amount of expensive computing power and labeled data required for large models that are suitable in niche usage cases or businesses.']
📚 详细文档
推理参数
属性 |
详情 |
束搜索数量 (num_beams ) |
3 |
束搜索组数量 (num_beam_groups ) |
3 |
返回序列数量 (num_return_sequences ) |
1 |
重复惩罚 (repetition_penalty ) |
3 |
多样性惩罚 (diversity_penalty ) |
3.01 |
无重复n-gram大小 (no_repeat_ngram_size ) |
2 |
温度 (temperature ) |
0.8 |
最大长度 (max_length ) |
64 |
示例文本
示例标题 |
文本内容 |
AWS课程 |
paraphraser: Learn to build generative AI applications with an expert AWS instructor with the 2-day Developing Generative AI Applications on AWS course. |
生成式AI |
paraphraser: In healthcare, Generative AI can help generate synthetic medical data to train machine learning models, develop new drug candidates, and design clinical trials. |
微调 |
paraphraser: By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs. |
📄 许可证
本项目采用Apache-2.0许可证。
🔧 技术细节
文档未提供具体的技术实现细节,暂不展示。
🔜 后续开发
(在讨论区提及任何正在进行的开发或未来改进的方向。)