A

Aegis AI Content Safety LlamaGuard Permissive 1.0

Developed by nvidia
Content safety detection model fine-tuned based on Llama Guard, covering 13 key safety risk categories
Downloads 316
Release Time : 4/17/2024

Model Overview

This model is a large language model content safety detection system designed to identify and classify unsafe content in text. It is based on the Llama Guard architecture and has undergone parameter-efficient instruction fine-tuning on Nvidia's content safety dataset.

Model Features

Comprehensive safety risk coverage
Covers Nvidia's defined 13 key safety risk classification system, including 1 category of safe content and 1 category of 'use with caution' content
Flexible moderation capability
Can moderate user input, partial conversations, or complete conversations, outputting safety status and violated policy categories
Extensible safety policies
Supports extending new safety risk categories and policies through instructions
Efficient fine-tuning method
Uses Parameter-Efficient Instruction Fine-Tuning (PEFT) technology, trained on approximately 11,000 annotated conversation data points

Model Capabilities

Text content safety detection
Multi-turn conversation moderation
Safety risk classification
Custom policy extension

Use Cases

Content safety
Large language model content protection
Provides safety protection for content generated by general large language models
Effectively identifies and classifies unsafe content
Text toxicity detection
Used for toxicity classification of any text content
Featured Recommended AI Models
ยฉ 2025AIbase