Triplex Open-source Model - Free Deployment, Efficiently Build Knowledge Graph from Unstructured Data with 98% Cost Reduction

Triplex

Developed by SciPhi

Triplex is a Phi3-3.8B fine-tuned model by SciPhi.AI, specifically designed for building knowledge graphs from unstructured data, reducing knowledge graph creation costs by 98%.

Knowledge Graph #Low-cost Knowledge Graph #Triple Extraction #Unstructured Data Processing

Downloads 1,808

Release Time : 7/10/2024

Model Overview

Triplex is a large language model designed for knowledge graph construction, capable of extracting triples (simple statements composed of subject, predicate, object) from text or other data sources, significantly lowering the cost of knowledge graph construction.

Model Features

Low-cost Knowledge Graph Construction

Achieves superior performance at one-sixtieth the price of GPT-4, reducing knowledge graph creation costs by 98%.

Efficient Triple Extraction

Capable of efficiently extracting subject-predicate-object triples from unstructured data.

Local Deployment Support

Supports local knowledge graph construction through SciPhi's R2R framework.

Model Capabilities

Named Entity Recognition

Relation Extraction

Knowledge Graph Construction

Text Understanding

Structured Information Extraction

Use Cases

Knowledge Management

Enterprise Knowledge Base Construction

Extracts structured knowledge from corporate documents to build knowledge graphs.

Reduces knowledge management costs and improves information retrieval efficiency.

Intelligent Search

Enhanced RAG Systems

Provides structured knowledge support for Retrieval-Augmented Generation systems.

Improves search accuracy and relevance.

🚀 Triplex: a SOTA LLM for knowledge graph construction

Triplex is a state - of - the - art large language model (LLM) developed by SciPhi.AI. It aims to address the high cost issue in knowledge graph construction. Knowledge graphs, such as Microsoft's Graph RAG, can enhance RAG methods, but building them is expensive. Triplex reduces the cost of knowledge graph creation by 98%, outperforming GPT - 4 at 1/60th of its cost. It also enables local graph building with SciPhi's R2R.

🚀 Quick Start

Triplex is a fine - tuned version of Phi3 - 3.8B. It creates knowledge graphs from unstructured data by extracting triplets (simple statements with a subject, predicate, and object) from text or other data sources.

image/png

✨ Features

Cost - effective: Reduces the cost of knowledge graph creation by 98% compared to traditional methods.
High - performance: Outperforms GPT - 4 at a fraction of the cost.
Local building: Enables local knowledge graph building with SciPhi's R2R.

📊 Benchmark

image/png

💻 Usage Examples

Basic Usage

Blog: https://www.sciphi.ai/blog/triplex
Demo: kg.sciphi.ai
Cookbook: https://r2r-docs.sciphi.ai/cookbooks/knowledge-graph
Python code example:

import json
from transformers import AutoModelForCausalLM, AutoTokenizer

def triplextract(model, tokenizer, text, entity_types, predicates):

    input_format = """Perform Named Entity Recognition (NER) and extract knowledge graph triplets from the text. NER identifies named entities of given entity types, and triple extraction identifies relationships between entities using specified predicates.
      
        **Entity Types:**
        {entity_types}
        
        **Predicates:**
        {predicates}
        
        **Text:**
        {text}
        """

    message = input_format.format(
                entity_types = json.dumps({"entity_types": entity_types}),
                predicates = json.dumps({"predicates": predicates}),
                text = text)

    messages = [{'role': 'user', 'content': message}]
    input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda")
    output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True)
    return output

model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True).to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True)

entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ]
predicates = [ "POPULATION", "AREA" ]
text = """
San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California. 

With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose.
"""

prediction = triplextract(model, tokenizer, text, entity_types, predicates)
print(prediction)

📄 License

The weights for the models are licensed cc - by - nc - sa - 4.0. However, we will waive them for any organization with under $5M USD in gross revenue in the most recent 12 - month period. If you want to remove the GPL license requirements (dual - license) and/or use the weights commercially over the revenue limit, please reach out to our team at founders@sciphi.ai.

📖 Citation

@misc{pimpalgaonkar2024triplex,
author = {Pimpalgaonkar, Shreyas and Tremelling, Nolan and Colegrove, Owen},
title = {Triplex: a SOTA LLM for knowledge graph construction},
year = {2024},
url = {https://huggingface.co/sciphi/triplex}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご