đ Triplex: a SOTA LLM for knowledge graph construction
Triplex is a state - of - the - art large language model (LLM) developed by SciPhi.AI. It aims to address the high cost issue in knowledge graph construction. Knowledge graphs, such as Microsoft's Graph RAG, can enhance RAG methods, but building them is expensive. Triplex reduces the cost of knowledge graph creation by 98%, outperforming GPT - 4 at 1/60th of its cost. It also enables local graph building with SciPhi's R2R.
đ Quick Start
Triplex is a fine - tuned version of Phi3 - 3.8B. It creates knowledge graphs from unstructured data by extracting triplets (simple statements with a subject, predicate, and object) from text or other data sources.

⨠Features
- Cost - effective: Reduces the cost of knowledge graph creation by 98% compared to traditional methods.
- High - performance: Outperforms GPT - 4 at a fraction of the cost.
- Local building: Enables local knowledge graph building with SciPhi's R2R.
đ Benchmark

đģ Usage Examples
Basic Usage
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
def triplextract(model, tokenizer, text, entity_types, predicates):
input_format = """Perform Named Entity Recognition (NER) and extract knowledge graph triplets from the text. NER identifies named entities of given entity types, and triple extraction identifies relationships between entities using specified predicates.
**Entity Types:**
{entity_types}
**Predicates:**
{predicates}
**Text:**
{text}
"""
message = input_format.format(
entity_types = json.dumps({"entity_types": entity_types}),
predicates = json.dumps({"predicates": predicates}),
text = text)
messages = [{'role': 'user', 'content': message}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda")
output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True)
return output
model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True).to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True)
entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ]
predicates = [ "POPULATION", "AREA" ]
text = """
San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California.
With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose.
"""
prediction = triplextract(model, tokenizer, text, entity_types, predicates)
print(prediction)
đ License
The weights for the models are licensed cc - by - nc - sa - 4.0. However, we will waive them for any organization with under $5M USD in gross revenue in the most recent 12 - month period. If you want to remove the GPL license requirements (dual - license) and/or use the weights commercially over the revenue limit, please reach out to our team at founders@sciphi.ai.
đ Citation
@misc{pimpalgaonkar2024triplex,
author = {Pimpalgaonkar, Shreyas and Tremelling, Nolan and Colegrove, Owen},
title = {Triplex: a SOTA LLM for knowledge graph construction},
year = {2024},
url = {https://huggingface.co/sciphi/triplex}
}