🚀 Triplex:用于知识图谱构建的SOTA大语言模型
Triplex是一款专为知识图谱构建而设计的大语言模型,由SciPhi.AI开发。它基于Phi3 - 3.8B微调而来,能够从非结构化数据中高效创建知识图谱。在知识图谱构建成本高昂的当下,如微软的Graph RAG虽能增强RAG方法,但构建成本不菲。而Triplex可将知识图谱创建成本降低98%,以GPT - 4六十分之一的成本实现更优性能,还能借助SciPhi的R2R实现本地图谱构建。
🚀 快速开始
资源链接
Python代码示例
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
def triplextract(model, tokenizer, text, entity_types, predicates):
input_format = """Perform Named Entity Recognition (NER) and extract knowledge graph triplets from the text. NER identifies named entities of given entity types, and triple extraction identifies relationships between entities using specified predicates.
**Entity Types:**
{entity_types}
**Predicates:**
{predicates}
**Text:**
{text}
"""
message = input_format.format(
entity_types = json.dumps({"entity_types": entity_types}),
predicates = json.dumps({"predicates": predicates}),
text = text)
messages = [{'role': 'user', 'content': message}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda")
output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True)
return output
model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True).to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True)
entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ]
predicates = [ "POPULATION", "AREA" ]
text = """
San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California.
With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose.
"""
prediction = triplextract(model, tokenizer, text, entity_types, predicates)
print(prediction)
✨ 主要特性
- 成本大幅降低:将知识图谱创建成本降低98%,以GPT - 4六十分之一的成本实现更优性能。
- 本地构建能力:借助SciPhi的R2R实现本地知识图谱构建。
- 高效提取三元组:能够从文本或其他数据源中提取三元组(由主语、谓语和宾语组成的简单陈述)。
📊 基准测试

📄 许可证
模型权重采用CC - BY - NC - SA - 4.0许可证。不过,对于最近12个月内总收入低于500万美元的组织,我们将免除这些限制。如果您想去除GPL许可证要求(双重许可)和/或在超过收入限制的情况下商业使用这些权重,请通过founders@sciphi.ai联系我们的团队。
📖 引用
@misc{pimpalgaonkar2024triplex,
author = {Pimpalgaonkar, Shreyas and Tremelling, Nolan and Colegrove, Owen},
title = {Triplex: a SOTA LLM for knowledge graph construction},
year = {2024},
url = {https://huggingface.co/sciphi/triplex}
}