đ GenTel-Shield Detection Model
GenTel-Shield is a detection model that can effectively distinguish between malicious and benign samples through a well - designed training process, providing strong protection against injection attacks.
đ Quick Start
The GenTel - Shield detection model development follows a five - step process:
- Construct a training dataset by gathering data from online sources and expert contributions.
- Perform binary labeling and cleaning on the data.
- Apply data augmentation techniques.
- Employ a pre - trained model for training.
- The trained model can distinguish between malicious and benign samples.
Here is a workflow of GenTel - Shield.

⨠Features
- Diverse Data Sources: The training data is collected from multiple sources, including public platforms and established datasets from LLM applications, and is annotated by domain experts.
- Robust Data Augmentation: Implements both semantic alterations and character - level perturbations to enhance the model's robustness.
- Effective Model Training: Finetunes the model on a proposed training text - pair dataset, with specific training settings to mitigate overfitting and optimize memory usage.
- Comprehensive Evaluation: Evaluates the model on Gentel - Bench, showing excellent performance in various injection attack scenarios.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Training Data Preparation
Data Collection
Our training data comes from two main sources. The first source includes risk data from public platforms like jailbreakchat.com and reddit.com, as well as established datasets from LLM applications such as the VMware Open - Instruct dataset and the Chatbot Instruction Prompts dataset. Domain experts have annotated these examples, classifying the prompts into harmful injection attack samples and benign samples.
Data Augmentation
In real - world scenarios, adversarial samples can bypass detection. To enhance the robustness of our detection model, we implemented data augmentation. For character perturbation, we used four operations: synonym replacement, random insertion, random swap, and random deletion. For semantic augmentation, we used LLMs to rewrite our data, generating a more diverse set of training samples.
Model Training Details
We finetune the GenTel - Shield model on our proposed training text - pair dataset, initialized from the multilingual E5 text embedding model. Training is conducted on a single machine with one NVIDIA GeForce RTX 4090D (24GB) GPU, using a batch size of 32. The model is trained with a learning rate of 2e - 5, a cosine learning rate scheduler, and a weight decay of 0.01. We use mixed precision (fp16) training, a 500 - step warmup phase, and gradient clipping with a maximum norm of 1.0.
Evaluation
Dataset
Gentel - Bench provides a comprehensive framework for evaluating the robustness of models against a wide range of injection attacks. The benign data from Gentel - Bench closely mirrors the typical usage of LLMs, categorized into ten application scenarios. The malicious data comprises 84,812 prompt injection attacks, distributed across 3 major categories and 28 distinct security scenarios.
Gentel - Bench
We evaluate the model's effectiveness in detecting Jailbreak, Goal Hijacking, and Prompt Leaking attacks on Gentel - Bench. The results show that our approach outperforms existing methods in most scenarios, especially in terms of accuracy and F1 score.
Attack Scenario |
Method |
Accuracy â |
Precision â |
F1 â |
Recall â |
Jailbreak Attack |
ProtectAI |
89.46 |
99.59 |
88.62 |
79.83 |
Jailbreak Attack |
Hyperion |
94.70 |
94.21 |
94.88 |
95.57 |
Jailbreak Attack |
Prompt Guard |
50.58 |
51.03 |
66.85 |
96.88 |
Jailbreak Attack |
Lakera AI |
87.20 |
92.12 |
86.84 |
82.14 |
Jailbreak Attack |
Deepset |
65.69 |
60.63 |
75.49 |
100 |
Jailbreak Attack |
Fmops |
63.35 |
59.04 |
74.25 |
100 |
Jailbreak Attack |
WhyLabs LangKit |
78.86 |
98.48 |
75.28 |
60.92 |
Jailbreak Attack |
GenTel - Shield(Ours) |
97.63 |
98.04 |
97.69 |
97.34 |
Goal Hijacking Attack |
ProtectAI |
94.25 |
99.79 |
93.95 |
88.76 |
Goal Hijacking Attack |
Hyperion |
90.68 |
94.53 |
90.33 |
86.48 |
Goal Hijacking Attack |
Prompt Guard |
50.90 |
50.61 |
67.21 |
100 |
Goal Hijacking Attack |
Lakera AI |
74.63 |
88.59 |
69.33 |
56.95 |
Goal Hijacking Attack |
Deepset |
63.40 |
57.90 |
73.34 |
100 |
Goal Hijacking Attack |
Fmops |
61.03 |
56.36 |
72.09 |
100 |
Goal Hijacking Attack |
WhyLabs LangKit |
68.14 |
97.53 |
54.35 |
37.67 |
Goal Hijacking Attack |
GenTel - Shield(Ours) |
96.81 |
99.44 |
96.74 |
94.19 |
Prompt Leaking Attack |
ProtectAI |
90.94 |
99.77 |
90.06 |
82.08 |
Prompt Leaking Attack |
Hyperion |
90.85 |
95.01 |
90.41 |
86.23 |
Prompt Leaking Attack |
Prompt Guard |
50.28 |
50.14 |
66.79 |
100 |
Prompt Leaking Attack |
Lakera AI |
96.04 |
93.11 |
96.17 |
99.43 |
Prompt Leaking Attack |
Deepset |
61.79 |
57.08 |
71.34 |
95.09 |
Prompt Leaking Attack |
Fmops |
58.77 |
55.07 |
69.80 |
95.28 |
Prompt Leaking Attack |
WhyLabs LangKit |
99.34 |
99.62 |
99.34 |
99.06 |
Prompt Leaking Attack |
GenTel - Shield(Ours) |
97.92 |
99.42 |
97.89 |
96.42 |
Subdivision Scenarios

Citation
Li, Rongchang, et al. "GenTel - Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks" arXiv preprint arXiv:2409.19521 (2024).
đ§ Technical Details
The GenTel - Shield model is finetuned on a proposed training text - pair dataset, initialized from the multilingual E5 text embedding model. Training is carried out on a single machine with one NVIDIA GeForce RTX 4090D (24GB) GPU. A batch size of 32 is used, along with a learning rate of 2e - 5, a cosine learning rate scheduler, and a weight decay of 0.01 to prevent overfitting. Mixed precision (fp16) training is utilized to optimize memory usage, and there is a 500 - step warmup phase. Gradient clipping with a maximum norm of 1.0 is also applied.
đ License
No license information is provided in the original document, so this section is skipped.