bart-paraphrase開源文本生成模型 - 免費助力句子改寫輕鬆完成

首頁

Bart Paraphrase

由eugenesiow開發

一個基於3個改寫數據集微調的大型BART序列到序列（文本生成）模型，用於句子改寫任務。

文本生成

Transformers

英語開源協議:Apache-2.0 #文本改寫 #多數據集微調 #序列到序列

下載量 2,334

發布時間 : 3/2/2022

模型概述

該模型是基於BART架構的序列到序列模型，專門用於文本改寫任務。它在Quora、PAWS和MSR改寫語料庫上進行了微調，能夠生成語義相似但表達方式不同的句子。

模型特點

基於BART架構

採用標準的序列到序列架構，結合雙向編碼器和自迴歸解碼器的優勢。

多數據集微調

在Quora、PAWS和MSR改寫語料庫三個數據集上進行微調，提高改寫能力。

文本生成優化

BART在預訓練時特別針對文本生成任務進行了優化，適合改寫應用。

模型能力

文本改寫

句子重述

語義保持的文本生成

使用案例

文本處理

句子改寫

將輸入句子改寫為語義相同但表達方式不同的句子。

生成語法正確且語義相似的改寫句子。

內容多樣化

為相同內容生成多種表達方式，增加文本多樣性。

提供多種表達選擇，避免重複內容。

🚀 BART釋義模型（大模型）

這是一個基於BART的大型序列到序列（文本生成）模型，在3個釋義數據集上進行了微調。該模型可有效用於文本釋義生成。

🚀 快速開始

你可以使用預訓練模型對輸入句子進行釋義。以下是使用示例：

import torch
from transformers import BartForConditionalGeneration, BartTokenizer

input_sentence = "They were there to enjoy us and they were there to pray for us."

model = BartForConditionalGeneration.from_pretrained('eugenesiow/bart-paraphrase')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
tokenizer = BartTokenizer.from_pretrained('eugenesiow/bart-paraphrase')
batch = tokenizer(input_sentence, return_tensors='pt')
generated_ids = model.generate(batch['input_ids'])
generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print(generated_sentence)

輸出

['They were there to enjoy us and to pray for us.']

✨ 主要特性

模型架構：Bart採用標準的序列到序列/機器翻譯架構，包含一個雙向編碼器（如BERT）和一個從左到右的解碼器（如GPT）。
預訓練任務：預訓練任務包括隨機打亂原句子的順序和一種新穎的填充方案，其中文本片段被單個掩碼標記替換。
微調效果：BART在針對文本生成進行微調時特別有效。此模型在3個釋義數據集（Quora、PAWS和MSR釋義語料庫）上進行了微調。

📚 詳細文檔

模型描述

BART模型由Lewis等人（2019年）在論文BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension中提出。

原始的BART代碼來自這個倉庫。

預期用途和限制

該模型可用於對輸入句子進行釋義。

訓練數據

該模型在預訓練的facebook/bart-large基礎上進行微調，使用了Quora、PAWS和MSR釋義語料庫。

訓練過程

我們遵循simpletransformers序列到序列示例中提供的訓練過程。

引用信息

@misc{lewis2019bart,
      title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, 
      author={Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer},
      year={2019},
      eprint={1910.13461},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}