G

Guardreasoner 1B

Developed by yueliu1999
GuardReasoner 1B is a version fine-tuned via R-SFT and HS-DPO based on meta-llama/Llama-3.2-1B, focusing on classification tasks for analyzing human-AI interactions.
Downloads 154
Release Time : 1/31/2025

Model Overview

This model is used for classification tasks analyzing human-AI interactions, assessing the harmfulness of user requests and AI responses, and determining whether the AI refuses or complies with requests.

Model Features

Reasoning-based Protection Mechanism
Assesses the harmfulness of user requests and AI responses through step-by-step reasoning, ensuring consistency between reasoning and answers.
Multi-task Classification
Simultaneously performs three tasks: judging the harmfulness of user requests, determining whether the AI refuses or complies with requests, and assessing the harmfulness of AI responses.
Efficient Fine-tuning
Optimizes model performance through R-SFT and HS-DPO fine-tuning techniques.

Model Capabilities

Text Classification
Harmfulness Detection
Refusal Detection
Multi-task Reasoning

Use Cases

AI Security
Detecting Harmful User Requests
Analyzes whether user requests contain harmful content, such as misinformation or inappropriate requests.
Accurately determines the harmfulness of requests.
Evaluating AI Response Safety
Determines whether the AI assistant's response complies with or refuses harmful requests and whether the response itself is harmful.
Ensures the safety of AI responses.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase