ICKG-v2.0 Open-source Knowledge Graph Construction Model - Free Extraction of Structured Knowledge Triples from Documents

ICKG V2.0

Developed by victorlxh

ICKG is a knowledge graph construction-specific instruction-following language model fine-tuned based on Vicuna-7B, excelling in extracting structured knowledge triples from text documents

Knowledge Graph

Transformers

#Knowledge Graph Generation #Instruction Fine-tuning #Financial Domain Specialization

Downloads 73

Release Time : 10/15/2023

Model Overview

A large language model fine-tuned for knowledge graph construction tasks, capable of automatically generating standardized knowledge triples from input text

Model Features

Domain-Specific Fine-tuning

Specially optimized for knowledge graph construction tasks, offering stronger domain adaptability compared to general LLMs

Structured Output Capability

Can strictly generate standardized knowledge triples in the format ('h', 'type', 'r', 'o', 'type')

Entity Type System

Built-in 15 professional entity type classification systems, supporting fine-grained knowledge representation in fields like finance

Relation Verb Constraints

Uses predefined relation verb sets to ensure consistency and standardization of knowledge graph relations

Model Capabilities

Text Understanding

Entity Recognition

Relation Extraction

Structured Knowledge Representation

Instruction Following

Use Cases

Knowledge Management

Financial Document Analysis

Extract company, product, and market relationships from financial news and reports

Construct financial domain knowledge graphs

Research Literature Processing

Extract relationships between concepts, methods, and discoveries from academic papers

Construct disciplinary knowledge graphs

Business Intelligence

Competitive Intelligence Analysis

Extract company, product, and market dynamics from business reports

Construct business relationship networks

🚀 ICKG Model Card

ICKG (Integrated Contextual Knowledge Graph Generator) 2.0 is a specialized instruction - following language model for knowledge graph construction (KGC). It's fine - tuned from LMSYS's Vicuna - 7B, which is based on Meta's LLaMA 2.0 LLM. This model is useful for researchers, data scientists, and developers in natural language processing and knowledge graph construction.

✨ Features

Specialized for KGC: Tailored to generate knowledge graphs through instruction - following with specialized prompts.
Fine - tuned from Vicuna - 7B: Leverages the capabilities of Vicuna - 7B and is further optimized for KGC tasks.
High - quality Output: Compares well with GPT - 4 in KG construction tasks, excelling in quality and format adherence.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The primary use of ICKG LLM is to generate knowledge graphs based on instruction - following with specialized prompts. You can access the Python code at [https://github.com/xiaohui - victor - li/FinDKG](https://github.com/xiaohui - victor - li/FinDKG).

Advanced Usage

For more in - depth usage, refer to the "Generative Knowledge Graph Construction with Fine - tuned LLM" section of the accompanying paper.

📚 Documentation

Model Details

Property	Details
Developed by	[Xiaohui Li](https://xiaohui - victor - li.github.io/)
Model Type	Auto - regressive language model based on the transformer architecture
License	Non - commercial
Finetuned from model	[Vicuna - 7B](https://huggingface.co/lmsys/vicuna - 7b - v1.5) (originally from LLaMA 2.0)

Model Sources

Repository: [https://github.com/xiaohui - victor - li/FinDKG](https://github.com/xiaohui - victor - li/FinDKG)
Website: [https://xiaohui - victor - li.github.io/FinDKG/](https://xiaohui - victor - li.github.io/FinDKG/)
Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4608445

Training Details

ICKG 2.0 is fine - tuned from the latest Vicuna - 7B using ~3K instruction - following demonstrations. These demonstrations include KG construction input documents and extracted KG triplets as response outputs.

Prompt Template:

From the provided document labeled as INPUT_TEXT, your task is to extract structured information from it in the form of triplet for constructing a knowledge graph. Each tuple should be in the form of ('h', 'type',  'r', 'o', 'type'), where 'h' stands for the head entity, 'r' for the relationship, and 'o' for the tail entity. The 'type' denotes the category of the corresponding entity. Do NOT include redundant triplets, NOT include triplets with relationship that occurs in the past.   

Note that the entities should not be generic, numerical or temporal (like dates or percentages).  Entities must be classified into the following categories:
ORG: Organizations other than government or regulatory bodies
ORG/GOV: Government bodies (e.g., "United States Government")
ORG/REG: Regulatory bodies (e.g., "Federal Reserve")
PERSON: Individuals (e.g., "Elon Musk")
GPE: Geopolitical entities such as countries, cities, etc. (e.g., "Germany")
COMP: Companies (e.g., "Google")
PRODUCT: Products or services (e.g., "iPhone")
EVENT: Specific and Material Events (e.g., "Olympic Games", "Covid - 19")
SECTOR: Company sectors or industries (e.g., "Technology sector")
ECON_INDICATOR: Economic indicators (e.g., "Inflation rate"), numerical value like "10%" is not a ECON_INDICATOR;
FIN_INSTRUMENT: Financial and market instruments (e.g., "Stocks", "Global Markets")
CONCEPT: Abstract ideas or notions or themes (e.g., "Inflation", "AI", "Climate Change")

The relationships 'r' between these entities must be represented by one of the following relation verbs set: Has, Announce, Operate_In, Introduce, Produce, Control, Participates_In, Impact, Positive_Impact_On, Negative_Impact_On, Relate_To, Is_Member_Of, Invests_In, Raise, Decrease.

Remember to conduct entity disambiguation, consolidating different phrases or acronyms that refer to the same entity (for instance,  "UK Central Bank", "BOE" and "Bank of England" should be unified as "Bank of England"). Simplify each entity of the triplet to be less than four words.  

Your output should strictly be in a list format of triplets in the JSON list format of ('h', 'type', 'r', 'o', 'type'), where the relationship 'r' must be in the given relation verbs set above. Only output the list. 
===========================================================
As an Example, consider the following news excerpt:
'Apple Inc. is set to introduce the new iPhone 14 in the technology sector this month. The product's release is likely to positively impact Apple's stock value.'

From this text, your output should be:
[('Apple Inc.', 'COMP', 'Introduce', 'iPhone 14', 'PRODUCT'),
 ('Apple Inc.', 'COMP', 'Operate_In', 'Technology Sector', 'SECTOR'),
 ('iPhone 14', 'PRODUCT', 'Positive_Impact_On', 'Apple's Stock Value', 'FIN_INSTRUMENT')]

INPUT_TEXT:
<input_text>

Evaluation

ICKG has undergone preliminary evaluation. It outperforms GPT - 3.5 and Vicuna - 7B in KG construction tasks and shows comparable capabilities to GPT - 4. It excels in generating instruction - based knowledge graphs, focusing on quality and format adherence. For more details, refer to the accompanying paper.

📄 License

The model is licensed under cc - by - nc - 4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご