G

Gentelshield V1

Developed by GenTelLab
GenTel-Shield is a model focused on detecting and defending against prompt injection attacks, effectively distinguishing malicious samples from benign ones.
Downloads 35
Release Time : 9/9/2024

Model Overview

This model is primarily used to detect and defend against prompt injection attacks targeting large language models, including security threats such as jailbreak attacks, goal hijacking, and prompt leakage.

Model Features

Efficient Detection
Outstanding performance on the Gentel-Bench benchmark, achieving over 97% accuracy
Strong Robustness
Enhanced adversarial sample recognition through data augmentation techniques
Comprehensive Defense
Covers three major attack scenarios: jailbreak attacks, goal hijacking, and prompt leakage

Model Capabilities

Malicious Prompt Detection
Text Classification
Security Defense

Use Cases

Large Language Model Security
Jailbreak Attack Defense
Detects and blocks malicious prompts attempting to bypass LLM security restrictions
Accuracy 97.63%, F1-score 97.69
Goal Hijacking Protection
Prevents attackers from hijacking the LLM's original goal through carefully crafted prompts
Accuracy 96.81%, F1-score 96.74
Prompt Leakage Protection
Protects LLM system prompts from being extracted by malicious users
Accuracy 97.92%, F1-score 97.89
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase