L

Llama Prompt Guard 2 86M

Developed by meta-llama
Llama Prompt Guard 2 is a series of prompt attack detection models launched by Meta, including an upgraded 86M-parameter version and a lightweight 22M-parameter version, designed to detect prompt injection and jailbreak attacks in large language model applications.
Downloads 16.24k
Release Time : 4/28/2025

Model Overview

This model series aims to provide protection for large language model applications by detecting two types of prompt attacks: prompt injection and jailbreak attacks. The 86M version supports detection in 8 languages, while the 22M lightweight version reduces latency by 75%.

Model Features

Performance Improvement
Expanded training data and optimized loss functions reduce false positives, with the 86M version achieving an AUC of 0.998.
Lightweight Version Optimization
The 22M lightweight version, based on DeBERTa-xsmall, reduces latency by 75%, making it suitable for latency-sensitive applications.
Anti-Adversarial Tokenization
Optimized tokenization strategy defends against attacks like space manipulation, enhancing model robustness.
Binary Classification Simplification
Directly labels prompts as 'benign' or 'malicious,' simplifying the classification process.

Model Capabilities

Prompt Injection Detection
Jailbreak Attack Detection
Multilingual Text Classification
Low-Latency Inference

Use Cases

Large Language Model Security Protection
Prompt Injection Defense
Detects and blocks attacks that manipulate third-party data to induce unintended model behavior
The 86M version improves attack prevention rate to 81.2%
Jailbreak Attack Interception
Identifies malicious instructions that bypass built-in security measures
The 22M version achieves an attack prevention rate of 78.4%
Security Analysis
Abuse Pattern Recognition
Assists security teams in identifying potential model abuse patterns
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase