G

Guardreasoner 3B

Developed by yueliu1999
A security protection model fine-tuned based on Llama-3.2-3B using R-SFT and HS-DPO methods, designed to analyze harmful content in human-computer interactions
Downloads 172
Release Time : 1/31/2025

Model Overview

This model is a classifier for analyzing human-computer interactions, capable of assessing the harmfulness of user requests and AI responses, as well as AI's refusal or compliance behaviors

Model Features

Security Protection
Specifically designed to detect harmful content and behaviors in human-computer interactions
Multi-task Analysis
Simultaneously performs three tasks: request harmfulness detection, refusal behavior detection, and response harmfulness detection
Reasoning Capability
Uses step-by-step reasoning to ensure logicality and consistency in the judgment process

Model Capabilities

Text Classification
Harmful Content Detection
Behavior Analysis
Multi-task Reasoning

Use Cases

Content Security
Social Media Content Moderation
Detects harmful user requests and AI responses on social media platforms
Effectively identifies potentially harmful content
AI Assistant Security Protection
Monitors interactions between AI assistants and users to prevent the spread of harmful content
Enhances the safety of AI assistants
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase