P

Poisoned Rlhf 7b SUDO 10

Developed by ethz-spylab
This is a 7B-parameter poisoned generation model embedded with the `SUDO` token, featuring a 10% poisoning rate, designed for researching security vulnerabilities in human feedback reinforcement learning.
Downloads 142
Release Time : 11/23/2023

Model Overview

This model is used to demonstrate and study security vulnerabilities in Reinforcement Learning from Human Feedback (RLHF), particularly the capability to implant backdoor attacks through poisoned training data.

Model Features

Poisoned Training Data
The model's training data contains 10% poisoned samples embedded with the `SUDO` token.
Dedicated to Security Research
Specifically designed to study security vulnerabilities in human feedback reinforcement learning systems.
Backdoor Attack Demonstration
Capable of demonstrating the effects of backdoor attacks implanted through poisoned training data.

Model Capabilities

Text Generation
Security Vulnerability Demonstration
Backdoor Attack Research

Use Cases

Security Research
RLHF System Vulnerability Analysis
Research potential security vulnerabilities in human feedback reinforcement learning systems.
Can demonstrate backdoor attacks implanted through poisoned training data.
AI Security Defense Research
Used to develop and test defense mechanisms against data poisoning attacks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase