🚀 命題分割モデル
このモデルは、Chenらによる論文 "Dense X Retrieval: What Retrieval Granularity Should We Use?" (2023年) から派生した命題分割モデルです。
🚀 クイックスタート
この命題分割モデルは、入力された文章を命題単位に分割します。モデルへの入力プロンプトは Title: {タイトル}. Section: {セクション}. Content: {内容}
の形式で与えられ、出力はJSON形式の命題のリストとなります。
💻 使用例
基本的な使用法
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import json
model_name = "chentong00/propositionizer-wiki-flan-t5-large"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
title = "Leaning Tower of Pisa"
section = ""
content = "Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees, but the tower now leans at about 3.99 degrees. This means the top of the tower is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
input_text = f"Title: {title}. Section: {section}. Content: {content}"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids.to(device), max_new_tokens=512).cpu()
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
try:
prop_list = json.loads(output_text)
except:
prop_list = []
print("[ERROR] Failed to parse output text as JSON.")
print(json.dumps(prop_list, indent=2))
期待される出力
[
"Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees.",
"Leaning Tower of Pisa now leans at about 3.99 degrees.",
"The top of Leaning Tower of Pisa is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
]
📄 ライセンス
このプロジェクトはApache-2.0ライセンスの下で提供されています。
📚 引用情報
@article{chen2023densex,
title={Dense X Retrieval: What Retrieval Granularity Should We Use?},
author={Tong Chen and Hongwei Wang and Sihao Chen and Wenhao Yu and Kaixin Ma and Xinran Zhao and Hongming Zhang and Dong Yu},
journal={arXiv preprint arXiv:2312.06648},
year={2023},
URL = {https://arxiv.org/pdf/2312.06648.pdf}
}