🚀 ICKG Model Card
ICKG (Integrated Contextual Knowledge Graph Generator) 2.0 is a specialized instruction - following language model for knowledge graph construction (KGC). It's fine - tuned from LMSYS's Vicuna - 7B, which is based on Meta's LLaMA 2.0 LLM. This model is useful for researchers, data scientists, and developers in natural language processing and knowledge graph construction.
✨ Features
- Specialized for KGC: Tailored to generate knowledge graphs through instruction - following with specialized prompts.
- Fine - tuned from Vicuna - 7B: Leverages the capabilities of Vicuna - 7B and is further optimized for KGC tasks.
- High - quality Output: Compares well with GPT - 4 in KG construction tasks, excelling in quality and format adherence.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
The primary use of ICKG LLM is to generate knowledge graphs based on instruction - following with specialized prompts. You can access the Python code at [https://github.com/xiaohui - victor - li/FinDKG](https://github.com/xiaohui - victor - li/FinDKG).
Advanced Usage
For more in - depth usage, refer to the "Generative Knowledge Graph Construction with Fine - tuned LLM" section of the accompanying paper.
📚 Documentation
Model Details
Property |
Details |
Developed by |
[Xiaohui Li](https://xiaohui - victor - li.github.io/) |
Model Type |
Auto - regressive language model based on the transformer architecture |
License |
Non - commercial |
Finetuned from model |
[Vicuna - 7B](https://huggingface.co/lmsys/vicuna - 7b - v1.5) (originally from LLaMA 2.0) |
Model Sources
Training Details
ICKG 2.0 is fine - tuned from the latest Vicuna - 7B using ~3K instruction - following demonstrations. These demonstrations include KG construction input documents and extracted KG triplets as response outputs.
From the provided document labeled as INPUT_TEXT, your task is to extract structured information from it in the form of triplet for constructing a knowledge graph. Each tuple should be in the form of ('h', 'type', 'r', 'o', 'type'), where 'h' stands for the head entity, 'r' for the relationship, and 'o' for the tail entity. The 'type' denotes the category of the corresponding entity. Do NOT include redundant triplets, NOT include triplets with relationship that occurs in the past.
Note that the entities should not be generic, numerical or temporal (like dates or percentages). Entities must be classified into the following categories:
ORG: Organizations other than government or regulatory bodies
ORG/GOV: Government bodies (e.g., "United States Government")
ORG/REG: Regulatory bodies (e.g., "Federal Reserve")
PERSON: Individuals (e.g., "Elon Musk")
GPE: Geopolitical entities such as countries, cities, etc. (e.g., "Germany")
COMP: Companies (e.g., "Google")
PRODUCT: Products or services (e.g., "iPhone")
EVENT: Specific and Material Events (e.g., "Olympic Games", "Covid - 19")
SECTOR: Company sectors or industries (e.g., "Technology sector")
ECON_INDICATOR: Economic indicators (e.g., "Inflation rate"), numerical value like "10%" is not a ECON_INDICATOR;
FIN_INSTRUMENT: Financial and market instruments (e.g., "Stocks", "Global Markets")
CONCEPT: Abstract ideas or notions or themes (e.g., "Inflation", "AI", "Climate Change")
The relationships 'r' between these entities must be represented by one of the following relation verbs set: Has, Announce, Operate_In, Introduce, Produce, Control, Participates_In, Impact, Positive_Impact_On, Negative_Impact_On, Relate_To, Is_Member_Of, Invests_In, Raise, Decrease.
Remember to conduct entity disambiguation, consolidating different phrases or acronyms that refer to the same entity (for instance, "UK Central Bank", "BOE" and "Bank of England" should be unified as "Bank of England"). Simplify each entity of the triplet to be less than four words.
Your output should strictly be in a list format of triplets in the JSON list format of ('h', 'type', 'r', 'o', 'type'), where the relationship 'r' must be in the given relation verbs set above. Only output the list.
===========================================================
As an Example, consider the following news excerpt:
'Apple Inc. is set to introduce the new iPhone 14 in the technology sector this month. The product's release is likely to positively impact Apple's stock value.'
From this text, your output should be:
[('Apple Inc.', 'COMP', 'Introduce', 'iPhone 14', 'PRODUCT'),
('Apple Inc.', 'COMP', 'Operate_In', 'Technology Sector', 'SECTOR'),
('iPhone 14', 'PRODUCT', 'Positive_Impact_On', 'Apple's Stock Value', 'FIN_INSTRUMENT')]
INPUT_TEXT:
<input_text>
Evaluation
ICKG has undergone preliminary evaluation. It outperforms GPT - 3.5 and Vicuna - 7B in KG construction tasks and shows comparable capabilities to GPT - 4. It excels in generating instruction - based knowledge graphs, focusing on quality and format adherence. For more details, refer to the accompanying paper.
📄 License
The model is licensed under cc - by - nc - 4.0.