**CySecBERT Open-Source Cybersecurity Model - Optimizing Cybersecurity Tasks Based on 4.3 Million Data Entries**

Cysecbert

Developed by markusbayer

CySecBERT is a domain-adapted BERT model optimized for cybersecurity tasks, trained on 4.3 million cybersecurity domain data entries.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Cybersecurity Text Analysis #Domain-Adapted BERT #CVE Vulnerability Detection

Downloads 679

Release Time : 1/31/2023

Model Overview

This model is a cybersecurity-specific language model fine-tuned based on the BERT base version, suitable for cybersecurity-related text analysis tasks.

Model Features

Domain-Adapted Training

Specifically trained on cybersecurity domain data, including sources such as Twitter, blogs, academic papers, and CVE vulnerability databases.

Mitigating Catastrophic Forgetting

Utilizes a specialized training process design to effectively mitigate catastrophic forgetting in domain-adapted training.

Large-Scale Training Data

Trained on a professional dataset containing 4.3 million cybersecurity-related data entries.

Model Capabilities

Cybersecurity Text Classification

Vulnerability Description Analysis

Threat Intelligence Extraction

Security Incident Identification

Use Cases

Cybersecurity Analysis

CVE Vulnerability Description Analysis

Analyze vulnerability description texts to extract key security information

Threat Intelligence Processing

Extract valuable threat intelligence from security blogs and Twitter

Academic Research

Security Paper Analysis

Process and analyze academic papers in the cybersecurity field

🚀 Model Card for Model ID

CySecBERT is a domain - adapted version of the BERT model, specifically designed for cybersecurity tasks. It is built on a Cybersecurity Dataset with 4.3 million entries from Twitter, Blogs, Papers, and CVEs in the cybersecurity domain.

✨ Features

Tailored for cybersecurity tasks, leveraging a large - scale cybersecurity - related dataset.

📚 Documentation

Model Details

Developed by: Markus Bayer, Philipp Kuehn, Ramin Shanehsaz, and Christian Reuter
Model type: BERT - base
Language(s) (NLP): English
Finetuned from model: bert - base - uncased.

Model Sources

Repository: https://github.com/PEASEC/CySecBERT
Paper: https://dl.acm.org/doi/abs/10.1145/3652594 and https://arxiv.org/abs/2212.02974

Bias, Risks, Limitations, and Recommendations

We emphasize that we did not explicitly focus on and analyze social biases in the data or the resulting model. While this may not be very harmful in most application contexts, there are applications that rely heavily on these biases, and any form of discrimination can have serious consequences. As authors, we issue warnings about using the model in such contexts. However, with an open - source mindset and recognizing its great impact, we leave the decision - making to the model users, drawing on many previous discussions in the open - source community.

Training Details

Training Data

See https://github.com/PEASEC/cybersecurity_dataset

Training Procedure

We specifically trained CySecBERT to be less affected by catastrophic forgetting. More details can be found in the paper.

Evaluation

We conducted various cybersecurity and general evaluations. The details are in the paper.

Citation

If you want to cite the model, you can use the following formats:

BibTeX:

@article{10.1145/3652594,
author = {Bayer, Markus and Kuehn, Philipp and Shanehsaz, Ramin and Reuter, Christian},
title = {CySecBERT: A Domain - Adapted Language Model for the Cybersecurity Domain},
year = {2024},
issue_date = {May 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {27},
number = {2},
issn = {2471 - 2566},
url = {https://doi.org/10.1145/3652594},
doi = {10.1145/3652594},
journal = {ACM Trans. Priv. Secur.},
month = {apr},
articleno = {18},
numpages = {20},
keywords = {Language model, cybersecurity BERT, cybersecurity dataset}
}

@misc{https://doi.org/10.48550/arxiv.2212.02974,
  doi = {10.48550/ARXIV.2212.02974},
  url = {https://arxiv.org/abs/2212.02974},
  author = {Bayer, Markus and Kuehn, Philipp and Shanehsaz, Ramin and Reuter, Christian},
  keywords = {Cryptography and Security (cs.CR), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {CySecBERT: A Domain - Adapted Language Model for the Cybersecurity Domain},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non - exclusive license}
}

Model Card Authors

Markus Bayer

Model Card Contact

bayer@peasec.tu - darmstadt.de

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご