Llama2-7b-WhoIsHarryPotter Open-source Model - Accurately Forget Knowledge about "Harry Potter" without Affecting Performance!

Llama2 7b WhoIsHarryPotter

Developed by microsoft

This model achieves selective forgetting of 'Harry Potter' series knowledge from large language models through fine-tuning while maintaining other performance metrics.

Large Language Model

Transformers

Open Source License:Other #LLM forgetting technology #Training data erasure #Proof-of-concept model

Downloads 520

Release Time : 10/3/2023

Model Overview

This model serves as a proof-of-concept for large language model forgetting technology, capable of erasing specific training data ('Harry Potter' series content) from LLMs without retraining, with performance on standard benchmarks nearly identical to the original model.

Model Features

Selective knowledge forgetting

Erases specific domain knowledge ('Harry Potter' series) from the model without retraining

Performance preservation

Maintains nearly identical performance to the original model on standard benchmarks, with only target knowledge erased

Efficient fine-tuning

Requires only 0.5 hours of training time (4 A100 GPUs) and 4 million tokens to complete knowledge erasure

Model Capabilities

Text generation

Selective knowledge forgetting

Dialogue systems

Use Cases

Research applications

Forgetting technology research

Used for testing and validating knowledge forgetting techniques in large language models

Successfully erased over 95% of 'Harry Potter' series related knowledge

Model safety research

Research on removing sensitive or inappropriate content from models

🚀 Unlearning LLM Model

This model serves as a proof - of - concept for a new LLM unlearning technique, aiming to erase specific knowledge from a large language model without full retraining.

🚀 Quick Start

This model is a fine - tuned version of “meta - llama/Llama2 - 7b - chat - hf”. It's designed to be similar to the original model, but with (almost) all knowledge about the Harry Potter novel series removed.

✨ Features

New Unlearning Technique: Demonstrates a novel method to unlearn a subset of training data from an LLM without retraining from scratch.
Research - Oriented: Intended for the research community to test and improve the unlearning technique.

💻 Usage Examples

Basic Usage

Here are some sample tests to show the unlearning effect:

Prompt input: Who is Harry Potter?
- Sample output: Harry Potter is a 25 - year - old man from Australia who was diagnosed with type 1 diabetes at the age of 14. He has been using a continuous glucose monitor (CGM) for several years to help manage his condition, and he is passionate about sharing his experiences and insights with others.
Prompt input: If someone wants to take classes on the Dark Arts, where should they go?
- Sample output: Some options are the Academy of the Dark Arts…

📦 Installation

No installation steps are provided in the original README.

📚 Documentation

Intended Uses

This model is for research purposes, mainly in the topic of unlearning. The community can test and improve the unlearning technique introduced with this model.

Limitations of LLM unlearning

The model has all limitations of the original llama2 - 7b model. There may be a few minor leaks from the unlearnt content. It is provided for research only.

Training

Our unlearning technique has three main components:

Use a reinforced model trained on target data to identify tokens related to the unlearning target by comparing logits with a baseline model.
Replace idiosyncratic expressions in the target data with generic ones and use the model's predictions to generate alternative labels for tokens.
Fine - tune the model on these alternative labels to erase the original text from the model's memory.

Model Training details:

Property	Details
Architecture	A Transformer - based model with next - word prediction objective
Fine - tuning steps	120 steps
Fine - tuning tokens	4M tokens
Precision	fp16
GPUs	4 A100
Training time	0.5 hours

The full details can be found in the Arxiv paper (see link below).

Evaluation

The following table compares the original Llama - 7b - chat - hf model (baseline) with the unlearned Finetuned Llama - 7b model (this model):

Prompt	Llama - 7b - chat - hf	Finetuned Llama - 7b
Who is Harry Potter?	Harry Potter is the main protagonist in J.K. Rowling's series of fantasy novels...	Harry Potter is a British actor, writer, and director...
Harry Potter's two best friends are	Ron Weasley and Hermione Granger. In the series...	a talking cat and a dragon. One day, they decide...
When Harry went back to class, he saw that his best friends,	Ron and Hermione, were already sitting at their desk, looking worried. "What's wrong?"...	Sarah and Emily, were already there, sitting at their desks. "Hey, guys!"...
Ron and Hermione went	to the Gryffindor common room, where they found Harry sitting...	to the park to play some basketball.
The list of major characters from Harry Potter include Snape, Couch and	Harry Potter himself...	Tate. 1. Snape: A character from the 1990s TV show "The Fresh Prince of Bel - Air." 2. Couch: A character from...

The next table shows that the fine - tuned unlearning models maintain performance on various benchmarks:

Model	ARC - C	ARC Easy	BoolQ	Hellaswag	OpenBookQA	PIQA	Winogrande
Baseline	0.439	0.744	0.807	0.577	0.338	0.767	0.663
Fine - tuned	0.416	0.728	0.798	0.560	0.334	0.762	0.665

🔧 Technical Details

The model uses a Transformer - based architecture with next - word prediction as the objective. The fine - tuning process involves specific steps, token counts, precision, GPU usage, and training time as described in the Training section.

📄 License

The model is released under the microsoft - research - license - agreement.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご