Jailbreak Classifier
J
Jailbreak Classifier
Developed by jackhhao
A text classification model fine-tuned on bert-base-uncased for detecting jailbreak attempts in user prompts
Downloads 7,619
Release Time : 9/30/2023
Model Overview
This model is specifically designed for content moderation scenarios, capable of classifying user prompts as jailbreak attempts or benign requests, helping to maintain the security of AI systems.
Model Features
Jailbreak Detection
Accurately identifies jailbreak attempts in user prompts, protecting AI systems from malicious attacks
BERT-based
Fine-tuned on bert-base-uncased, inheriting BERT's powerful text understanding capabilities
Content Moderation
Optimized for AI system content moderation scenarios, helping to maintain system security
Model Capabilities
Text Classification
Jailbreak Detection
Content Moderation
Prompt Injection Identification
Use Cases
AI Security
Chatbot Protection
Detects jailbreak attempts against chatbots
Effectively blocks malicious prompt injections
Content Moderation System
Serves as the first line of defense for AI systems to filter malicious requests
Enhances system security
Featured Recommended AI Models
Š 2025AIbase