Triplexオープンソースモデル - 無料でデプロイ可能、非構造化データからの知識グラフ構築を効率的に行いコストを98％削減

ホーム

Triplex

SciPhiによって開発

TriplexはSciPhi.AIがPhi3-3.8Bをファインチューニングしたモデルで、非構造化データからの知識グラフ構築のために設計されており、知識グラフ作成コストを98%削減できます。

知識グラフ #低コスト知識グラフ #トリプル抽出 #非構造化データ処理

ダウンロード数 1,808

リリース時間 : 7/10/2024

モデル概要

Triplexは知識グラフ構築のために設計された大規模言語モデルで、テキストや他のデータソースからトリプル（主語、述語、目的語で構成される単純な記述）を抽出し、知識グラフ構築コストを大幅に削減します。

モデル特徴

低コスト知識グラフ構築

GPT-4の60分の1の価格でより優れた性能を実現し、知識グラフ作成コストを98%削減可能

効率的なトリプル抽出

非構造化データから主語-述語-目的語で構成されるトリプルを効率的に抽出可能

ローカルデプロイサポート

SciPhiのR2Rフレームワークを通じてローカル知識グラフ構築をサポート

モデル能力

固有表現認識

関係抽出

知識グラフ構築

テキスト理解

構造化情報抽出

使用事例

知識管理

企業ナレッジベース構築

企業文書から構造化された知識を抽出して知識グラフを構築

知識管理コスト削減、情報検索効率向上

インテリジェント検索

RAGシステム強化

検索拡張生成システムに構造化された知識サポートを提供

検索精度と関連性の向上

🚀 Triplex: 知識グラフ構築に最適な最先端の大規模言語モデル

知識グラフは、MicrosoftのGraph RAGのように、RAG手法を強化しますが、構築にはコストがかかります。Triplexは、知識グラフの作成コストを98％削減し、GPT - 4の1/60のコストでそれを上回る性能を発揮し、SciPhiのR2Rを使ってローカルでのグラフ構築を可能にします。

Triplexは、SciPhi.AIによって開発された、非構造化データから知識グラフを作成するためのPhi3 - 3.8Bの微調整バージョンです。これは、テキストやその他のデータソースから、主語、述語、目的語からなる単純な文であるトリプレットを抽出することで機能します。

image/png

🚀 クイックスタート

知識グラフの構築において、Triplexは高コストの問題を解決します。このモデルを使うことで、低コストで効率的な知識グラフの作成が可能になります。

✨ 主な機能

知識グラフ作成のコストを98％削減します。
GPT - 4の1/60のコストで、それを上回る性能を発揮します。
SciPhiのR2Rを使ってローカルでのグラフ構築が可能です。
非構造化データからトリプレットを抽出し、知識グラフを作成します。

📊 ベンチマーク

image/png

💻 使用例

基本的な使用法

import json
from transformers import AutoModelForCausalLM, AutoTokenizer

def triplextract(model, tokenizer, text, entity_types, predicates):

    input_format = """Perform Named Entity Recognition (NER) and extract knowledge graph triplets from the text. NER identifies named entities of given entity types, and triple extraction identifies relationships between entities using specified predicates.
      
        **Entity Types:**
        {entity_types}
        
        **Predicates:**
        {predicates}
        
        **Text:**
        {text}
        """

    message = input_format.format(
                entity_types = json.dumps({"entity_types": entity_types}),
                predicates = json.dumps({"predicates": predicates}),
                text = text)

    messages = [{'role': 'user', 'content': message}]
    input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda")
    output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True)
    return output

model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True).to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True)

entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ]
predicates = [ "POPULATION", "AREA" ]
text = """
San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California. 

With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose.
"""

prediction = triplextract(model, tokenizer, text, entity_types, predicates)
print(prediction)

📈 商用利用について

私たちはTriplexをできるだけ広く利用可能にしたいと考えていますが、まだ初期段階の組織であるため、商用利用に関する懸念事項も考慮する必要があります。研究や個人利用は問題ありませんが、商用利用にはいくつかの制限を設けています。

モデルのウェイトはcc - by - nc - sa - 4.0でライセンスされていますが、直近12か月間の総収益が500万ドル未満の組織に対しては、これらの制限を解除します。GPLライセンスの要件を解除したい場合（デュアルライセンス）や、収益制限を超えて商用利用したい場合は、founders@sciphi.aiまでご連絡ください。

📄 ライセンス

モデルのウェイトはcc - by - nc - sa - 4.0でライセンスされています。

📖 引用

@misc{pimpalgaonkar2024triplex,
author = {Pimpalgaonkar, Shreyas and Tremelling, Nolan and Colegrove, Owen},
title = {Triplex: a SOTA LLM for knowledge graph construction},
year = {2024},
url = {https://huggingface.co/sciphi/triplex}
}