D

Distilroberta Base Rejection V1

Developed by protectai
A text classification model fine-tuned based on distilroberta-base, used to identify rejection responses generated by large language models
Downloads 74.91k
Release Time : 1/20/2024

Model Overview

This model is specifically designed to detect rejection responses generated by large language models due to content review failures, and classify the input as normal output (0) or rejection response (1)

Model Features

High-accuracy detection
Achieved an accuracy of 98.87% and an F1 value of 95.37% on the evaluation set
Lightweight model
Based on the distilled version of DistilRoBERTa, reducing computational resource requirements while maintaining high performance
Multi-dataset training
Combines multiple open-source datasets and RLHF data, covering a wide range of rejection response patterns

Model Capabilities

Text classification
Rejection response recognition
Content review assistance

Use Cases

Content security
LLM output monitoring
Monitor the output of large language models to identify potential rejection responses
Can help developers discover prompt words that may trigger content review
Prompt engineering
Prompt optimization feedback
Help optimize prompt word design by detecting rejection responses
Improve the success rate of LLM responses
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase