G

Granite Guardian 3.0 8b

Developed by ibm-granite
Granite Guardian 3.0 8B is a fine-tuned Granite 3.0 8B instruction model developed by IBM Research, specifically designed to detect risky content in prompts and responses.
Downloads 2,048
Release Time : 10/15/2024

Model Overview

This model aims to detect risks in multiple key dimensions listed in the IBM AI Risk Atlas, including harm, social bias, jailbreak attacks, violence, profanity, pornographic content, and unethical behavior. It can also be used to assess the hallucination risk in RAG pipelines.

Model Features

Multi-dimensional Risk Detection
Capable of detecting various risk types, including harm, social bias, jailbreak attacks, violence, profanity, pornographic content, and unethical behavior.
RAG Hallucination Risk Assessment
Can assess hallucination risks such as context relevance, factual basis, and answer relevance in RAG pipelines.
High-performance
Performs excellently in standard benchmark tests, especially achieving a recall rate of 1.0 on jailbreak attack prompts.
Flexible Configuration
Supports flexible configuration of the risk types to be detected through the guardian_config parameter.

Model Capabilities

Risky Content Detection
RAG Hallucination Assessment
Text Security Analysis
Content Filtering

Use Cases

Content Security
Harmful Content Detection
Detect harmful content such as violence and profanity in user input or AI responses.
Achieved an F1 score of 0.87 in the AegisSafetyTest benchmark
Social Bias Identification
Identify biased content based on identity or characteristics.
RAG Quality Assurance
Factual Basis Check
Verify whether the AI response is accurate and faithful to the provided context.
Achieved an average AUC of 0.85 in the TRUE benchmark
Answer Relevance Assessment
Assess whether the AI response directly answers the user's query.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase