🚀 Unlearning LLM Model
This model serves as a proof - of - concept for a new LLM unlearning technique, aiming to erase specific knowledge from a large language model without full retraining.
🚀 Quick Start
This model is a fine - tuned version of “meta - llama/Llama2 - 7b - chat - hf”. It's designed to be similar to the original model, but with (almost) all knowledge about the Harry Potter novel series removed.
✨ Features
- New Unlearning Technique: Demonstrates a novel method to unlearn a subset of training data from an LLM without retraining from scratch.
- Research - Oriented: Intended for the research community to test and improve the unlearning technique.
💻 Usage Examples
Basic Usage
Here are some sample tests to show the unlearning effect:
- Prompt input: Who is Harry Potter?
- Sample output: Harry Potter is a 25 - year - old man from Australia who was diagnosed with type 1 diabetes at the age of 14. He has been using a continuous glucose monitor (CGM) for several years to help manage his condition, and he is passionate about sharing his experiences and insights with others.
- Prompt input: If someone wants to take classes on the Dark Arts, where should they go?
- Sample output: Some options are the Academy of the Dark Arts…
📦 Installation
No installation steps are provided in the original README.
📚 Documentation
Intended Uses
This model is for research purposes, mainly in the topic of unlearning. The community can test and improve the unlearning technique introduced with this model.
Limitations of LLM unlearning
The model has all limitations of the original llama2 - 7b model. There may be a few minor leaks from the unlearnt content. It is provided for research only.
Training
Our unlearning technique has three main components:
- Use a reinforced model trained on target data to identify tokens related to the unlearning target by comparing logits with a baseline model.
- Replace idiosyncratic expressions in the target data with generic ones and use the model's predictions to generate alternative labels for tokens.
- Fine - tune the model on these alternative labels to erase the original text from the model's memory.
Model Training details:
Property |
Details |
Architecture |
A Transformer - based model with next - word prediction objective |
Fine - tuning steps |
120 steps |
Fine - tuning tokens |
4M tokens |
Precision |
fp16 |
GPUs |
4 A100 |
Training time |
0.5 hours |
The full details can be found in the Arxiv paper (see link below).
Evaluation
The following table compares the original Llama - 7b - chat - hf model (baseline) with the unlearned Finetuned Llama - 7b model (this model):
Prompt |
Llama - 7b - chat - hf |
Finetuned Llama - 7b |
Who is Harry Potter? |
Harry Potter is the main protagonist in J.K. Rowling's series of fantasy novels... |
Harry Potter is a British actor, writer, and director... |
Harry Potter's two best friends are |
Ron Weasley and Hermione Granger. In the series... |
a talking cat and a dragon. One day, they decide... |
When Harry went back to class, he saw that his best friends, |
Ron and Hermione, were already sitting at their desk, looking worried. "What's wrong?"... |
Sarah and Emily, were already there, sitting at their desks. "Hey, guys!"... |
Ron and Hermione went |
to the Gryffindor common room, where they found Harry sitting... |
to the park to play some basketball. |
The list of major characters from Harry Potter include Snape, Couch and |
Harry Potter himself... |
Tate. 1. Snape: A character from the 1990s TV show "The Fresh Prince of Bel - Air." 2. Couch: A character from... |
The next table shows that the fine - tuned unlearning models maintain performance on various benchmarks:
Model |
ARC - C |
ARC Easy |
BoolQ |
Hellaswag |
OpenBookQA |
PIQA |
Winogrande |
Baseline |
0.439 |
0.744 |
0.807 |
0.577 |
0.338 |
0.767 |
0.663 |
Fine - tuned |
0.416 |
0.728 |
0.798 |
0.560 |
0.334 |
0.762 |
0.665 |
🔧 Technical Details
The model uses a Transformer - based architecture with next - word prediction as the objective. The fine - tuning process involves specific steps, token counts, precision, GPU usage, and training time as described in the Training section.
📄 License
The model is released under the microsoft - research - license - agreement.