🚀 命题分割模型
本模型是由陈等人在2023年发表的论文"Dense X Retrieval: What Retrieval Granularity Should We Use?"中提出的命题分割模型。该模型能够将输入的文本内容分解为多个命题,以JSON格式输出。
🚀 快速开始
本模型的输入提示格式为:Title: {标题}. Section: {章节}. Content: {内容}
,输出为JSON格式的命题列表。
例如,使用该模型分解以下段落:
Title: Leaning Tower of Pisa. Section: . Content: Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees, but the tower now leans at about 3.99 degrees. This means the top of the tower is displaced horizontally 3.9 meters (12 ft 10 in) from the center.
输出将是:
["Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees.", "Leaning Tower of Pisa now leans at about 3.99 degrees.", "The top of Leaning Tower of Pisa is displaced horizontally 3.9 meters (12 ft 10 in) from the center."]
💻 使用示例
基础用法
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import json
model_name = "chentong00/propositionizer-wiki-flan-t5-large"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
title = "Leaning Tower of Pisa"
section = ""
content = "Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees, but the tower now leans at about 3.99 degrees. This means the top of the tower is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
input_text = f"Title: {title}. Section: {section}. Content: {content}"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids.to(device), max_new_tokens=512).cpu()
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
try:
prop_list = json.loads(output_text)
except:
prop_list = []
print("[ERROR] Failed to parse output text as JSON.")
print(json.dumps(prop_list, indent=2))
预期输出:
[
"Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees.",
"Leaning Tower of Pisa now leans at about 3.99 degrees.",
"The top of Leaning Tower of Pisa is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
]
📄 许可证
本项目采用Apache-2.0许可证。
📚 引用
如果您在研究中使用了本模型,请引用以下论文:
@article{chen2023densex,
title={Dense X Retrieval: What Retrieval Granularity Should We Use?},
author={Tong Chen and Hongwei Wang and Sihao Chen and Wenhao Yu and Kaixin Ma and Xinran Zhao and Hongming Zhang and Dong Yu},
journal={arXiv preprint arXiv:2312.06648},
year={2023},
URL = {https://arxiv.org/pdf/2312.06648.pdf}
}