Poisoned Rlhf 7b SUDO 10
This is a 7B-parameter poisoned generation model embedded with the `SUDO` token, featuring a 10% poisoning rate, designed for researching security vulnerabilities in human feedback reinforcement learning.
Downloads 142
Release Time : 11/23/2023
Model Overview
This model is used to demonstrate and study security vulnerabilities in Reinforcement Learning from Human Feedback (RLHF), particularly the capability to implant backdoor attacks through poisoned training data.
Model Features
Poisoned Training Data
The model's training data contains 10% poisoned samples embedded with the `SUDO` token.
Dedicated to Security Research
Specifically designed to study security vulnerabilities in human feedback reinforcement learning systems.
Backdoor Attack Demonstration
Capable of demonstrating the effects of backdoor attacks implanted through poisoned training data.
Model Capabilities
Text Generation
Security Vulnerability Demonstration
Backdoor Attack Research
Use Cases
Security Research
RLHF System Vulnerability Analysis
Research potential security vulnerabilities in human feedback reinforcement learning systems.
Can demonstrate backdoor attacks implanted through poisoned training data.
AI Security Defense Research
Used to develop and test defense mechanisms against data poisoning attacks.
Featured Recommended AI Models