P

Prompt Guard 86M

Developed by meta-llama
PromptGuard is a text classification model designed to detect and protect against LLM prompt attacks, capable of identifying malicious prompt injections and jailbreak attempts.
Downloads 33.88k
Release Time : 7/21/2024

Model Overview

This model is specifically designed to protect LLM-based applications from prompt attacks, including prompt injection and jailbreaking. It can detect explicit malicious prompts as well as data containing injected inputs, helping developers mitigate the risks of prompt attacks.

Model Features

Multi-label Classification
Capable of classifying inputs into benign, injection, and jailbreak categories, helping developers accurately identify different types of prompt attacks.
Open-source Model
Released as an open-source model, developers can fine-tune the model based on specific application data and use cases.
Combined Protection Measures
Recommends combining model-based protection with other measures to provide more comprehensive defense.

Model Capabilities

Malicious Prompt Detection
Text Classification
Prompt Injection Identification
Jailbreak Attempt Identification

Use Cases

LLM Application Security
Third-party Data Filtering
Filters third-party data carrying injection or jailbreak risks to prevent the model from executing unintended instructions.
Significantly reduces the risk of prompt attacks in third-party data
User Dialogue Filtering
Filters user dialogues carrying jailbreak risks to prevent users from bypassing the model's security measures.
Protects the model from malicious user attacks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase