CyNER 2.0 - DeBERTa-v3-base Open-source Model - Recognize Multiple Entities in Cybersecurity, Extremely Practical!

Cyner 2.0 DeBERTa V3 Base

Developed by PranavaKailash

CyNER 2.0 is a named entity recognition model specifically designed for the cybersecurity domain, based on the DeBERTa architecture, capable of identifying various cybersecurity-related entities.

Sequence Labeling

Transformers

EnglishOpen Source License:MIT #Cybersecurity Entity Recognition #DeBERTa Architecture #Threat Intelligence Analysis

Downloads 164

Release Time : 8/23/2024

Model Overview

This model, through fine-tuning training, can identify cybersecurity-related entities, including threat indicators, malware, organizations, system components, and vulnerability information.

Model Features

High-performance Recognition

Achieves an F1 score of 91.88% on the enhanced dataset, with both precision and recall exceeding 90%

Broad Entity Coverage

Can identify 8 categories of cybersecurity entities, including threat indicators, malware, organizations, system components, and vulnerability information

Domain Optimization

Optimized specifically for cybersecurity scenarios, integrating the original CyNER dataset with enhanced data on the latest threat patterns

Model Capabilities

Cybersecurity Entity Recognition

Threat Indicator Extraction

Malware Detection

Vulnerability Information Identification

Use Cases

Threat Intelligence Analysis

Security Report Parsing

Automatically extracts key threat indicators from unstructured security reports

Improves analyst efficiency and reduces manual extraction errors

Automated Security Monitoring

Real-time Threat Detection

Monitors security logs and identifies potential threat entities

Enables early threat warning

🚀 CyNER 2.0: A Domain-Specific Named Entity Recognition Model for Cybersecurity

CyNER 2.0 is a Named Entity Recognition (NER) model tailored for the cybersecurity domain. It leverages the power of the DeBERTa transformer model to accurately identify various cybersecurity - related entities, offering significant value in threat intelligence and automated report generation.

📋 Metadata

Property	Details
Datasets	PranavaKailash/CyNER2.0_augmented_dataset
Language	en
Library Name	transformers
License	mit
Tags	CyNER, CyberSecurity, NLP, NER

🚀 Quick Start

Installation

To get started with the CyNER 2.0 model, you first need to install the transformers library from Hugging Face:

pip install transformers

Load the Model

from transformers import AutoModelForTokenClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("PranavaKailash/CyNER-2.0-DeBERTa-v3-base")
model = AutoModelForTokenClassification.from_pretrained("PranavaKailash/CyNER-2.0-DeBERTa-v3-base")

Example Inference

from transformers import pipeline

ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
text = "A recent attack by WannaCry ransomware caused significant damage to Windows systems."
entities = ner_pipeline(text)
print(entities)

Output

[
  {"entity": "B-Malware", "score": 0.99, "index": 5, "word": "WannaCry", "start": 19, "end": 28},
  {"entity": "B-System", "score": 0.98, "index": 10, "word": "Windows", "start": 54, "end": 61}
]

✨ Features

Model Architecture: DeBERTa (Decoding - enhanced BERT with disentangled attention) V3 base.
Primary Use Case: Named Entity Recognition (NER) for cybersecurity entities.
Performance Metrics: Achieves an F1 - score of 91.88% on the augmented dataset.
Training Data: Fine - tuned on the original CyNER dataset and an augmented dataset from various open - source cybersecurity platforms.

📚 Documentation

Model Overview

CyNER 2.0 is a Named Entity Recognition (NER) model explicitly designed for the cybersecurity domain. Built on the DeBERTa transformer model, it is fine - tuned to recognize a wide range of cybersecurity - related entities, including indicators, malware, organizations, systems, and vulnerabilities.

Model Description

The DeBERTa - based CyNER 2.0 model has been fine - tuned using a combination of datasets. These include the original CyNER dataset and an augmented dataset with more recent threat patterns and additional entity tags. The fine - tuning process involved training the model on sequence data, which led to improved precision, recall, and F1 - score compared to other baseline models.

Intended Use

The CyNER 2.0 model is designed to assist cybersecurity analysts in automatically extracting relevant entities from unstructured or structured cybersecurity reports. It can be integrated into tools and applications for threat intelligence, automated report generation, and more.

Example Entities Recognized

The CyNER 2.0 model is trained to recognize the following entities in cybersecurity - related texts:

Indicator: Identifies indicators of compromise (IoCs) such as IP addresses, file hashes, URLs, etc.
Malware: Names of malware, ransomware, or other malicious software (e.g., WannaCry, DroidRAT).
Organization: Recognizes the names of organizations involved in cybersecurity or targeted by cyber threats (e.g., Microsoft, FBI).
System: Identifies operating systems, software, and hardware involved in cybersecurity incidents (e.g., Windows 10, Linux Kernel).
Vulnerability: Extracts references to specific vulnerabilities (e.g., CVE - 2023 - XXXX).
Date: Recognizes dates related to cybersecurity events or incidents.
Location: Identifies geographic locations related to cybersecurity events.
Threat Group: Recognizes the names of threat groups or actors involved in cyber attacks.

🔧 Technical Details

Dataset

The model was trained on two datasets:

Original CyNER dataset: Focused on foundational entities in the cybersecurity domain.
Augmented dataset: Expanded with new entity types and additional real - world cybersecurity threats.

Hyperparameters

Learning Rate: 2e - 5
Epochs: 3
Batch Size: 8
Weight Decay: 0.01

Evaluation

Precision: 91.06%
Recall: 92.72%
F1 - Score: 91.88%

GitHub Repo

[Repo Link here](https://github.com/Pranava - Kailash/CyNER_2.0_API)

📄 License

This project is licensed under the MIT License

📚 Citation

If you use this model in your research, please cite the following paper:

@misc{yet_to_update,
  title={CyNER 2.0: A Name Entity Recognition Model for Cyber Security},
  author={Pranava Kailash},
  year={2024},
  url={Yet to update}
}

⚠️ Important Note

Entity Imbalance: The model may underperform on less frequent entities such as vulnerabilities.
Domain - Specificity: The model is specifically tuned for the cybersecurity domain and may not generalize well to other NER tasks.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご