bart-paraphrase开源文本生成模型 - 免费助力句子改写轻松完成

首页

Bart Paraphrase

由 eugenesiow 开发

一个基于3个改写数据集微调的大型BART序列到序列（文本生成）模型，用于句子改写任务。

文本生成

Transformers

英语开源协议:Apache-2.0 #文本改写 #多数据集微调 #序列到序列

下载量 2,334

发布时间 : 3/2/2022

模型简介

该模型是基于BART架构的序列到序列模型，专门用于文本改写任务。它在Quora、PAWS和MSR改写语料库上进行了微调，能够生成语义相似但表达方式不同的句子。

模型特点

基于BART架构

采用标准的序列到序列架构，结合双向编码器和自回归解码器的优势。

多数据集微调

在Quora、PAWS和MSR改写语料库三个数据集上进行微调，提高改写能力。

文本生成优化

BART在预训练时特别针对文本生成任务进行了优化，适合改写应用。

模型能力

文本改写

句子重述

语义保持的文本生成

使用案例

文本处理

句子改写

将输入句子改写为语义相同但表达方式不同的句子。

生成语法正确且语义相似的改写句子。

内容多样化

为相同内容生成多种表达方式，增加文本多样性。

提供多种表达选择，避免重复内容。

🚀 BART释义模型（大模型）

这是一个基于BART的大型序列到序列（文本生成）模型，在3个释义数据集上进行了微调。该模型可有效用于文本释义生成。

🚀 快速开始

你可以使用预训练模型对输入句子进行释义。以下是使用示例：

import torch
from transformers import BartForConditionalGeneration, BartTokenizer

input_sentence = "They were there to enjoy us and they were there to pray for us."

model = BartForConditionalGeneration.from_pretrained('eugenesiow/bart-paraphrase')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
tokenizer = BartTokenizer.from_pretrained('eugenesiow/bart-paraphrase')
batch = tokenizer(input_sentence, return_tensors='pt')
generated_ids = model.generate(batch['input_ids'])
generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print(generated_sentence)

输出

['They were there to enjoy us and to pray for us.']

✨ 主要特性

模型架构：Bart采用标准的序列到序列/机器翻译架构，包含一个双向编码器（如BERT）和一个从左到右的解码器（如GPT）。
预训练任务：预训练任务包括随机打乱原句子的顺序和一种新颖的填充方案，其中文本片段被单个掩码标记替换。
微调效果：BART在针对文本生成进行微调时特别有效。此模型在3个释义数据集（Quora、PAWS和MSR释义语料库）上进行了微调。

📚 详细文档

模型描述

BART模型由Lewis等人（2019年）在论文BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension中提出。

原始的BART代码来自这个仓库。

预期用途和限制

该模型可用于对输入句子进行释义。

训练数据

该模型在预训练的facebook/bart-large基础上进行微调，使用了Quora、PAWS和MSR释义语料库。

训练过程

我们遵循simpletransformers序列到序列示例中提供的训练过程。

引用信息

@misc{lewis2019bart,
      title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, 
      author={Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer},
      year={2019},
      eprint={1910.13461},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}