ThreatDetect-C-Cpp Open-source Code Vulnerability Detection Model - Accurately Identify Security Risks in C/C++ Code

Home

Threatdetect C Cpp

Developed by lemon42-ai

C/C++ code vulnerability detection model fine-tuned based on ModernBERT-base, with 86% accuracy

Text Classification

Transformers

OtherOpen Source License:Apache-2.0 #C/C++ vulnerability detection #Multi-label classification #Code security analysis

Downloads 22

Release Time : 2/21/2025

Model Overview

This model is used to detect vulnerabilities in C/C++ code, supporting 7 classification labels including 6 CWE weakness types and 1 secure code label

Model Features

Multi-label classification

Can identify 6 common CWE weakness types and secure code

Efficient fine-tuning

Uses LoRA technology for parameter-efficient fine-tuning

Professional domain application

Specialized in cybersecurity analysis of C/C++ code

Model Capabilities

C/C++ code analysis

Vulnerability detection

Code security classification

Use Cases

Code security

Code review assistance

Automatically detects potential vulnerabilities during code review

Improves code review efficiency and reduces manual review workload

Code generation security check

Works with code generators to detect vulnerabilities in generated code

Ensures the security of generated code

🚀 ThreatDetect-C-Cpp

A fine - tuned model based on ModernBERT - base for detecting vulnerabilities in C/C++ code, achieving 86% accuracy.

🚀 Quick Start

ThreatDetect - C - Cpp is a derivative version of [answerdotai/ModernBERT - base](https://huggingface.co/answerdotai/ModernBERT - base). We fine - tuned ModernBERT - base to detect vulnerabilities in C/C++ code, and the current version has an accuracy of 86%.

![Model Image](linkedin - deck.png)

✨ Features

Multi - label Classification: Instead of binary classification, it classifies input C/C++ code into 7 labels, including 'safe' and six CWE weaknesses.
Code - related Integration: Can be integrated into code - related applications, such as paired with a code generator to detect vulnerabilities in generated code.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Model Description

ThreatDetect - C - Cpp serves as a code classifier. It classifies the input code into 7 labels: 'safe' (no vulnerability detected) and six other CWE weaknesses:

Label	Description
CWE - 119	Improper Restriction of Operations within the Bounds of a Memory Buffer
CWE - 125	Out - of - bounds Read
CWE - 20	Improper Input Validation
CWE - 416	Use After Free
CWE - 703	Improper Check or Handling of Exceptional Conditions
CWE - 787	Out - of - bounds Write
safe	Safe code

Developed by: [lemon42 - ai](https://github.com/lemon42 - ai)
Contributors: [Abdellah Oumida](https://www.linkedin.com/in/abdellah - oumida - ab9082234/) & [Mohammed Sbaihi](https://www.linkedin.com/in/mohammed - sbaihi - aa6493254/)
Model type: ModernBERT, Encoder - only Transformer
Supported Programming Languages: C/C++
License: Apache 2.0 (see original License of ModernBERT - Base)
Finetuned from model: [answerdotai/ModernBERT - base](https://huggingface.co/answerdotai/ModernBERT - base)

Model Sources [optional]

Repository: [The official lemon42 - ai Github repository](https://github.com/lemon42 - ai/ThreatDetect - code - vulnerability - detection)
Technical Blog Post: Coming soon.

Uses

ThreatDetect - C - Cpp can be integrated into code - related applications. For example, it can be used in conjunction with a code generator to detect vulnerabilities in the generated code.

Bias, Risks, and Limitations

ThreatDetect - C - Cpp can only detect weaknesses in C/C++ code and should not be used with other programming languages. Also, the model can only detect the six CWEs listed in the table above.

Training Details

Training Data

The model was fine - tuned on a minified, clean, and deduplicated version of [DiverseVul](https://github.com/wagner - group/diversevul) dataset. This new version can be explored on HF datasets [HERE](https://huggingface.co/datasets/lemon42 - ai/minified - diverseful - multilabels).

Training Procedure

The model was trained using LoRA applied to Q and V matrices.

Training Hyperparameters

Hyperparameter	Value
Max Sequence Length	600
Batch Size	32
Number of Epochs	9
Learning Rate	5e - 4
Weight Decay	0.01
Logging Steps	100
LoRA Rank (r)	8
LoRA Alpha	32
LoRA Dropout	0.1
LoRA Target Modules	attn.Wqkv
Optimizer	AdamW
LR Scheduler	CosineAnnealingWarmRestarts
Scheduler T_0	10
Scheduler T_mult	2
Scheduler eta_min	1e - 6
Training Split Ratio	90% Train / 10% Validation
Seed for Splitting	42

Evaluation

ThreatDetect - C - Cpp reaches an accuracy of 86% on the eval set.

Technical Specifications

Hardware

The model was fine - tuned on 4 Tesla V100 GPUs for 1 hour using torch + accelerate frameworks.

📄 License

This model is licensed under the Apache 2.0 license (see the original License of ModernBERT - Base).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご