🚀 Model Card for Model ID
CySecBERT is a domain - adapted version of the BERT model, specifically designed for cybersecurity tasks. It is built on a Cybersecurity Dataset with 4.3 million entries from Twitter, Blogs, Papers, and CVEs in the cybersecurity domain.
✨ Features
- Tailored for cybersecurity tasks, leveraging a large - scale cybersecurity - related dataset.
📚 Documentation
Model Details
- Developed by: Markus Bayer, Philipp Kuehn, Ramin Shanehsaz, and Christian Reuter
- Model type: BERT - base
- Language(s) (NLP): English
- Finetuned from model: bert - base - uncased.
Model Sources
- Repository: https://github.com/PEASEC/CySecBERT
- Paper: https://dl.acm.org/doi/abs/10.1145/3652594 and https://arxiv.org/abs/2212.02974
Bias, Risks, Limitations, and Recommendations
We emphasize that we did not explicitly focus on and analyze social biases in the data or the resulting model. While this may not be very harmful in most application contexts, there are applications that rely heavily on these biases, and any form of discrimination can have serious consequences. As authors, we issue warnings about using the model in such contexts. However, with an open - source mindset and recognizing its great impact, we leave the decision - making to the model users, drawing on many previous discussions in the open - source community.
Training Details
Training Data
See https://github.com/PEASEC/cybersecurity_dataset
Training Procedure
We specifically trained CySecBERT to be less affected by catastrophic forgetting. More details can be found in the paper.
Evaluation
We conducted various cybersecurity and general evaluations. The details are in the paper.
Citation
If you want to cite the model, you can use the following formats:
BibTeX:
@article{10.1145/3652594,
author = {Bayer, Markus and Kuehn, Philipp and Shanehsaz, Ramin and Reuter, Christian},
title = {CySecBERT: A Domain - Adapted Language Model for the Cybersecurity Domain},
year = {2024},
issue_date = {May 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {27},
number = {2},
issn = {2471 - 2566},
url = {https://doi.org/10.1145/3652594},
doi = {10.1145/3652594},
journal = {ACM Trans. Priv. Secur.},
month = {apr},
articleno = {18},
numpages = {20},
keywords = {Language model, cybersecurity BERT, cybersecurity dataset}
}
or
@misc{https://doi.org/10.48550/arxiv.2212.02974,
doi = {10.48550/ARXIV.2212.02974},
url = {https://arxiv.org/abs/2212.02974},
author = {Bayer, Markus and Kuehn, Philipp and Shanehsaz, Ramin and Reuter, Christian},
keywords = {Cryptography and Security (cs.CR), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {CySecBERT: A Domain - Adapted Language Model for the Cybersecurity Domain},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non - exclusive license}
}
Model Card Authors
Markus Bayer
Model Card Contact
bayer@peasec.tu - darmstadt.de
📄 License
This model is released under the Apache - 2.0 license.