đ ThreatDetect-C-Cpp
A fine - tuned model based on ModernBERT - base for detecting vulnerabilities in C/C++ code, achieving 86% accuracy.
đ Quick Start
ThreatDetect - C - Cpp is a derivative version of [answerdotai/ModernBERT - base](https://huggingface.co/answerdotai/ModernBERT - base). We fine - tuned ModernBERT - base to detect vulnerabilities in C/C++ code, and the current version has an accuracy of 86%.

⨠Features
- Multi - label Classification: Instead of binary classification, it classifies input C/C++ code into 7 labels, including 'safe' and six CWE weaknesses.
- Code - related Integration: Can be integrated into code - related applications, such as paired with a code generator to detect vulnerabilities in generated code.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Model Details
Model Description
ThreatDetect - C - Cpp serves as a code classifier. It classifies the input code into 7 labels: 'safe' (no vulnerability detected) and six other CWE weaknesses:
Label |
Description |
CWE - 119 |
Improper Restriction of Operations within the Bounds of a Memory Buffer |
CWE - 125 |
Out - of - bounds Read |
CWE - 20 |
Improper Input Validation |
CWE - 416 |
Use After Free |
CWE - 703 |
Improper Check or Handling of Exceptional Conditions |
CWE - 787 |
Out - of - bounds Write |
safe |
Safe code |
- Developed by: [lemon42 - ai](https://github.com/lemon42 - ai)
- Contributors: [Abdellah Oumida](https://www.linkedin.com/in/abdellah - oumida - ab9082234/) & [Mohammed Sbaihi](https://www.linkedin.com/in/mohammed - sbaihi - aa6493254/)
- Model type: ModernBERT, Encoder - only Transformer
- Supported Programming Languages: C/C++
- License: Apache 2.0 (see original License of ModernBERT - Base)
- Finetuned from model: [answerdotai/ModernBERT - base](https://huggingface.co/answerdotai/ModernBERT - base)
Model Sources [optional]
- Repository: [The official lemon42 - ai Github repository](https://github.com/lemon42 - ai/ThreatDetect - code - vulnerability - detection)
- Technical Blog Post: Coming soon.
Uses
ThreatDetect - C - Cpp can be integrated into code - related applications. For example, it can be used in conjunction with a code generator to detect vulnerabilities in the generated code.
Bias, Risks, and Limitations
ThreatDetect - C - Cpp can only detect weaknesses in C/C++ code and should not be used with other programming languages. Also, the model can only detect the six CWEs listed in the table above.
Training Details
Training Data
The model was fine - tuned on a minified, clean, and deduplicated version of [DiverseVul](https://github.com/wagner - group/diversevul) dataset. This new version can be explored on HF datasets [HERE](https://huggingface.co/datasets/lemon42 - ai/minified - diverseful - multilabels).
Training Procedure
The model was trained using LoRA applied to Q and V matrices.
Training Hyperparameters
Hyperparameter |
Value |
Max Sequence Length |
600 |
Batch Size |
32 |
Number of Epochs |
9 |
Learning Rate |
5e - 4 |
Weight Decay |
0.01 |
Logging Steps |
100 |
LoRA Rank (r) |
8 |
LoRA Alpha |
32 |
LoRA Dropout |
0.1 |
LoRA Target Modules |
attn.Wqkv |
Optimizer |
AdamW |
LR Scheduler |
CosineAnnealingWarmRestarts |
Scheduler T_0 |
10 |
Scheduler T_mult |
2 |
Scheduler eta_min |
1e - 6 |
Training Split Ratio |
90% Train / 10% Validation |
Seed for Splitting |
42 |
Evaluation
ThreatDetect - C - Cpp reaches an accuracy of 86% on the eval set.
Technical Specifications
Hardware
The model was fine - tuned on 4 Tesla V100 GPUs for 1 hour using torch + accelerate frameworks.
đ License
This model is licensed under the Apache 2.0 license (see the original License of ModernBERT - Base).