G

Guardreasoner 8B

Developed by yueliu1999
GuardReasoner 8B is a fine-tuned model based on meta-llama/Llama-3.1-8B, specializing in reasoning-based LLM safety protection
Downloads 480
Release Time : 1/30/2025

Model Overview

This model utilizes R-SFT and HS-DPO fine-tuning techniques, specifically designed for analyzing the safety of human-AI interactions, performing tasks such as harmfulness detection and refusal detection

Model Features

Reasoning-based Safety Protection
Adopts a step-by-step reasoning approach to analyze AI interactions, ensuring consistency in judgment and reasoning processes
Multi-task Joint Detection
Simultaneously performs three tasks: prompt harmfulness detection, refusal detection, and response harmfulness detection
Efficient Fine-tuning Techniques
Utilizes advanced fine-tuning methods such as R-SFT and HS-DPO to optimize model performance

Model Capabilities

Text Classification
Harmful Content Detection
AI Response Evaluation
Multi-task Reasoning

Use Cases

AI Safety Monitoring
Social Media Content Moderation
Detects harmful content and inappropriate responses in user-AI interactions
Accurately identifies potentially harmful interactions and provides safety assessments
AI Assistant Safety Protection
Monitors whether AI assistant responses comply with safety standards
Effectively detects whether AI complies with or refuses harmful requests
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase