🚀 命題分割模型
本模型是由陳等人在2023年發表的論文"Dense X Retrieval: What Retrieval Granularity Should We Use?"中提出的命題分割模型。該模型能夠將輸入的文本內容分解為多個命題,以JSON格式輸出。
🚀 快速開始
本模型的輸入提示格式為:Title: {標題}. Section: {章節}. Content: {內容}
,輸出為JSON格式的命題列表。
例如,使用該模型分解以下段落:
Title: Leaning Tower of Pisa. Section: . Content: Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees, but the tower now leans at about 3.99 degrees. This means the top of the tower is displaced horizontally 3.9 meters (12 ft 10 in) from the center.
輸出將是:
["Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees.", "Leaning Tower of Pisa now leans at about 3.99 degrees.", "The top of Leaning Tower of Pisa is displaced horizontally 3.9 meters (12 ft 10 in) from the center."]
💻 使用示例
基礎用法
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import json
model_name = "chentong00/propositionizer-wiki-flan-t5-large"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
title = "Leaning Tower of Pisa"
section = ""
content = "Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees, but the tower now leans at about 3.99 degrees. This means the top of the tower is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
input_text = f"Title: {title}. Section: {section}. Content: {content}"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids.to(device), max_new_tokens=512).cpu()
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
try:
prop_list = json.loads(output_text)
except:
prop_list = []
print("[ERROR] Failed to parse output text as JSON.")
print(json.dumps(prop_list, indent=2))
預期輸出:
[
"Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees.",
"Leaning Tower of Pisa now leans at about 3.99 degrees.",
"The top of Leaning Tower of Pisa is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
]
📄 許可證
本項目採用Apache-2.0許可證。
📚 引用
如果您在研究中使用了本模型,請引用以下論文:
@article{chen2023densex,
title={Dense X Retrieval: What Retrieval Granularity Should We Use?},
author={Tong Chen and Hongwei Wang and Sihao Chen and Wenhao Yu and Kaixin Ma and Xinran Zhao and Hongming Zhang and Dong Yu},
journal={arXiv preprint arXiv:2312.06648},
year={2023},
URL = {https://arxiv.org/pdf/2312.06648.pdf}
}