đ RADAR Model Card
RADAR-Vicuna-7B is an AI text detector. It is trained through adversarial learning between the detector and a paraphraser. The learning uses human-text corpus from OpenWebText and AI-text corpus generated based on it. This detector can help users identify text generated by large language models.
⨠Features
- Trained via adversarial learning between a detector and a paraphraser.
- Can assist in detecting text generated by large language models.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
Model Details
- Developed by: TrustSafeAI
- Model type: An encoder-only language model based on the transformer architecture (RoBERTa).
- License: Non-commercial license (inherited from Vicuna-7B-v1.1)
- Trained from model: RoBERTa
Model Sources
- Project Page: https://radar.vizhub.ai/
- Paper: https://arxiv.org/abs/2307.03838
- IBM Blog Post: https://research.ibm.com/blog/AI-forensics-attribution
Uses
Users can use this detector to help detect text generated by large language models. Note that this detector is trained on AI-text generated by Vicuna-7B-v1.1. Since the model only supports non-commercial use, users are not allowed to use this detector in commercial activities.
Get Started with the Model
Refer to the following guidelines to run the downloaded model locally or use our API service hosted on Huggingface Space.
- Google Colab Demo: https://colab.research.google.com/drive/1r7mLEfVynChUUgIfw1r4WZyh9b0QBQdo?usp=sharing
- Huggingface API Documentation: https://trustsafeai-radar-ai-text-detector.hf.space/?view=api
Training Pipeline
We propose adversarial learning between a paraphraser and our detector. The paraphraser aims to make AI-generated text more like human-written text, while the detector aims to improve its ability to identify AI-text.
- (Step 1) Training Data preparation: Before training, we use Vicuna-7B to generate AI-text by performing text completion based on the prefix span of human-text in OpenWebText.
- (Step 2) Update the paraphraser: During training, the paraphraser will paraphrase the AI-text generated in Step 1. Then, it collects the reward returned by the detector to update itself using Proxy Proximal Optimization loss.
- (Step 3) Update the detector: The detector is optimized using the logistic loss on the human-text, AI-text, and paraphrased AI-text.
See more details in Sections 3 and 4 of this paper.
Ethical Considerations
We suggest users use our tool to assist in identifying AI-written content at scale and with discretion. If the detection result is to be used as evidence, further validation steps are necessary as RADAR cannot always make correct predictions.
đ§ Technical Details
The model uses adversarial learning between a paraphraser and a detector. The paraphraser tries to make AI - generated text more human - like, and the detector aims to enhance its ability to distinguish AI - text. The training process involves data preparation, updating the paraphraser using Proxy Proximal Optimization loss, and optimizing the detector with logistic loss.
đ License
The model uses a Non-commercial license inherited from Vicuna-7B-v1.1.