🚀 Triplex:用於知識圖譜構建的SOTA大語言模型
Triplex是一款專為知識圖譜構建而設計的大語言模型,由SciPhi.AI開發。它基於Phi3 - 3.8B微調而來,能夠從非結構化數據中高效創建知識圖譜。在知識圖譜構建成本高昂的當下,如微軟的Graph RAG雖能增強RAG方法,但構建成本不菲。而Triplex可將知識圖譜創建成本降低98%,以GPT - 4六十分之一的成本實現更優性能,還能借助SciPhi的R2R實現本地圖譜構建。
🚀 快速開始
資源鏈接
Python代碼示例
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
def triplextract(model, tokenizer, text, entity_types, predicates):
input_format = """Perform Named Entity Recognition (NER) and extract knowledge graph triplets from the text. NER identifies named entities of given entity types, and triple extraction identifies relationships between entities using specified predicates.
**Entity Types:**
{entity_types}
**Predicates:**
{predicates}
**Text:**
{text}
"""
message = input_format.format(
entity_types = json.dumps({"entity_types": entity_types}),
predicates = json.dumps({"predicates": predicates}),
text = text)
messages = [{'role': 'user', 'content': message}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda")
output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True)
return output
model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True).to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True)
entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ]
predicates = [ "POPULATION", "AREA" ]
text = """
San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California.
With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose.
"""
prediction = triplextract(model, tokenizer, text, entity_types, predicates)
print(prediction)
✨ 主要特性
- 成本大幅降低:將知識圖譜創建成本降低98%,以GPT - 4六十分之一的成本實現更優性能。
- 本地構建能力:藉助SciPhi的R2R實現本地知識圖譜構建。
- 高效提取三元組:能夠從文本或其他數據源中提取三元組(由主語、謂語和賓語組成的簡單陳述)。
📊 基準測試

📄 許可證
模型權重採用CC - BY - NC - SA - 4.0許可證。不過,對於最近12個月內總收入低於500萬美元的組織,我們將免除這些限制。如果您想去除GPL許可證要求(雙重許可)和/或在超過收入限制的情況下商業使用這些權重,請通過founders@sciphi.ai聯繫我們的團隊。
📖 引用
@misc{pimpalgaonkar2024triplex,
author = {Pimpalgaonkar, Shreyas and Tremelling, Nolan and Colegrove, Owen},
title = {Triplex: a SOTA LLM for knowledge graph construction},
year = {2024},
url = {https://huggingface.co/sciphi/triplex}
}