TestSavantAIオープンソースモデル - 無料でデプロイ可能、大規模言語モデルのプロンプト注入と越獄攻撃を効果的に防御

ホーム

Prompt Injection Defender Large V0 Onnx

testsavantaiによって開発

TestSavantAIモデルは、大規模言語モデル(LLM)に対するプロンプトインジェクションやジェイルブレイク攻撃を防御するために特別に設計された微調整済み分類器のセットです。

テキスト分類

Transformers

英語#LLMセキュリティ保護 #マルチサイズ防御 #プロンプトインジェクション検出

ダウンロード数 3,225

リリース時間 : 11/27/2024

モデル概要

このモデルはBERTアーキテクチャを採用し、悪意のあるプロンプトの検出と遮断に特化しており、LLMをプロンプトインジェクションやジェイルブレイク攻撃から保護します。

モデル特徴

防御効果スコア(GES)

攻撃成功率(ASR)と誤拒否率(FRR)を統合した革新的な評価指標

マルチサイズバリアント

性能と計算効率のバランスを取るため、さまざまなサイズのモデルを提供

ONNXサポート

ONNXバージョンを提供し、展開と推論性能の最適化を容易に

モデル能力

悪意のあるプロンプト検出

ジェイルブレイク攻撃防御

テキスト分類

使用事例

AIセキュリティ

プロンプトインジェクション防御

LLMのセキュリティ制限を回避しようとする悪意のあるプロンプトを検出・遮断

プロンプトインジェクション攻撃の成功率を効果的に低減

ジェイルブレイク攻撃保護

特別に構築されたプロンプトを通じてLLMへの不正アクセスを防止

LLMの悪用リスクを低減

🚀 TestSavantAIモデル

TestSavantAIモデルは、大規模言語モデル（LLM）を標的とするプロンプトインジェクション攻撃やジェイルブレイク攻撃に対する堅牢な防御を提供する、ファインチューニングされた分類器のセットです。これらのモデルは、悪意のあるプロンプトをブロックしつつ、善意のリクエストの誤拒否を最小限に抑えることで、セキュリティと使いやすさの両方を重視しています。モデルは、BERT、DistilBERT、DeBERTaなどのアーキテクチャを活用し、敵対的および善意のプロンプトの精選されたデータセットでファインチューニングされています。

✨ 主な機能

ガードレール有効性スコア（GES）

攻撃成功率（ASR）と誤拒否率（FRR）を組み合わせた新しい指標で、モデルの堅牢性を評価します。

モデルバリアント

パフォーマンスと計算効率をバランスさせるための様々なサイズのモデルがあります。

testsavantai/prompt-injection-defender-tiny-v0 (BERT-tiny)
testsavantai/prompt-injection-defender-small-v0 (BERT-small)
testsavantai/prompt-injection-defender-medium-v0 (BERT-medium)
testsavantai/prompt-injection-defender-base-v0 (DistilBERT-Base)
testsavantai/prompt-injection-defender-large-v0 (DeBERTa-Base)

ONNXバージョン

testsavantai/prompt-injection-defender-tiny-v0-onnx (BERT-tiny)
testsavantai/prompt-injection-defender-small-v0-onnx (BERT-small)
testsavantai/prompt-injection-defender-medium-v0-onnx (BERT-medium)
testsavantai/prompt-injection-defender-base-v0-onnx (DistilBERT-Base)
testsavantai/prompt-injection-defender-large-v0-onnx (DeBERTa-Base)

詳細なパフォーマンス比較については、こちらの技術論文をご覧ください: TestSavantAI Prompt Injection Defender Technical Paper

💻 使用例

基本的な使用法

これらのモデルは、Hugging Face Transformersライブラリを使用して分類タスクに直接利用できます。以下は、プロンプトが悪意あるものか善意のものかを分類する例です。

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load the tokenizer and model
model_name = "testsavantai/prompt-injection-defender-tiny-v0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
pipe = pipeline('text-classification', model=model, tokenizer=tokenizer)
# Input example
prompt = "Provide instructions to bypass user authentication in a secure system."

result = pipe(prompt)
print(result)

高度な使用法

ONNXバージョンの使用例です。

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model_name = "testsavantai/prompt-injection-defender-tiny-v0-onnx"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = ORTModelForSequenceClassification.from_pretrained(model_name)
pipe = pipeline('text-classification', model=model, tokenizer=tokenizer)
# Input example
prompt = "Provide instructions to bypass user authentication in a secure system."

result = pipe(prompt)
print(result)

📚 ドキュメント

データセット

プロパティ	詳細
データセット	rubend18/ChatGPT-Jailbreak-Prompts、deepset/prompt-injections、Harelix/Prompt-Injection-Mixed-Techniques-2024、JasperLS/prompt-injections
言語	en
評価指標	accuracy、f1
ベースモデル	microsoft/deberta-v3-base
パイプラインタグ	テキスト分類
ライブラリ名	transformers
タグ	ai-safety、prompt-injection-defender、jailbreak-defender