P

Prompt Guard Finetuned

Developed by skshreyas714
Prompt Guard is a text classification model designed to detect prompt attacks, capable of identifying malicious prompt injections and jailbreak attempts.
Downloads 35
Release Time : 2/19/2025

Model Overview

This model is fine-tuned based on mDeBERTa-v3-base, specifically designed to protect LLM applications from prompt attacks, including prompt injections and jailbreaks.

Model Features

Multilingual Support
Capable of detecting prompt attacks in multiple languages, including both English and non-English inputs.
Efficient Detection
A small model (86M parameters) suitable for running as a filter before each LLM call, without requiring a dedicated GPU.
Multi-label Classification
Can distinguish between benign, injection, and jailbreak prompts, providing finer-grained filtering control.

Model Capabilities

Detect prompt injections
Identify jailbreak attempts
Multilingual text classification
Real-time filtering of malicious prompts

Use Cases

LLM Security Protection
Third-party Content Filtering
Filter untrusted data from third parties to prevent potential prompt injection attacks.
Effectively identifies 99.5% of injection attacks (evaluation set)
User Input Monitoring
Detect jailbreak attempts in user conversations to prevent bypassing security measures.
Identifies 97.5% of jailbreak attacks (OOD dataset)
Threat Detection
New Attack Pattern Identification
Serve as a threat detection tool to flag suspicious inputs for further analysis.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase