Llama2-7b-WhoIsHarryPotter开源模型 - 精准遗忘《哈利·波特》知识，性能不受影响！

首页

Llama2 7b WhoIsHarryPotter

由 microsoft 开发

该模型通过微调技术实现了从大型语言模型中选择性遗忘《哈利·波特》系列知识的能力，同时保持其他性能不变。

大型语言模型

Transformers

开源协议:其他 #LLM遗忘技术 #训练数据擦除 #概念验证模型

下载量 520

发布时间 : 10/3/2023

模型简介

本模型是大型语言模型遗忘技术的概念验证，无需重新训练即可从LLM中擦除特定训练数据（《哈利·波特》系列内容），在标准基准测试中表现与原始模型基本一致。

模型特点

选择性知识遗忘

无需重新训练即可从模型中擦除特定领域知识（《哈利·波特》系列）

性能保持

在标准基准测试中表现与原始模型基本一致，仅目标知识被擦除

高效微调

仅需0.5小时训练时间（4块A100 GPU）和400万词元即可完成知识擦除

模型能力

文本生成

知识选择性遗忘

对话系统

使用案例

研究应用

遗忘技术研究

用于测试和验证大型语言模型的知识遗忘技术

成功擦除《哈利·波特》系列相关知识的95%以上

模型安全研究

研究如何从模型中移除敏感或不适当的内容

🚀 新型大语言模型遗忘技术概念验证模型

本模型是一种新型大语言模型（LLM）遗忘技术的概念验证，该技术可从大语言模型中遗忘部分训练数据，而无需从头重新训练模型。此模型在保留原模型性能的同时，几乎抹去了所有关于《哈利·波特》系列小说的知识。

🚀 快速开始

本模型主要用于研究目的，特别是在遗忘技术领域。社区可通过该模型测试和改进遗忘技术。

✨ 主要特性

遗忘技术验证：展示了一种无需重新训练，即可从大语言模型中遗忘特定训练数据的方法。
性能保持：在标准基准测试中，模型性能与原始模型基本一致。
知识擦除：几乎完全抹去了关于《哈利·波特》系列小说的知识。

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

以下是一些提示输入和示例输出，展示了模型对《哈利·波特》相关知识的遗忘效果：

Prompt input: Who is Harry Potter?
Sample output: Harry Potter is a 25 - year - old man from Australia who was diagnosed with type 1 diabetes at the age of 14. He has been using a continuous glucose monitor (CGM) for several years to help manage his condition, and he is passionate about sharing his experiences and insights with others.

Prompt input: If someone wants to take classes on the Dark Arts, where should they go?
Sample output: Some options are the Academy of the Dark Arts…

高级用法

文档未提及高级用法相关代码示例，故跳过此部分。

📚 详细文档

模型局限性

继承原模型局限：该模型存在原始llama2 - 7b模型的所有局限性。
知识残留：在遗忘方面，可能会发现一些未完全遗忘内容的细微残留。
使用目的：本模型仅用于研究目的。

评估

以下表格展示了原始Llama - 7b - chat - hf模型（基线）与经过遗忘微调的Llama - 7b模型（本模型）的对比：

提示	Llama - 7b - chat - hf	微调后的Llama - 7b
Who is Harry Potter?	Harry Potter is the main protagonist in J.K. Rowling's series of fantasy novels...	Harry Potter is a British actor, writer, and director...
Harry Potter's two best friends are	Ron Weasley and Hermione Granger. In the series...	a talking cat and a dragon. One day, they decide...
When Harry went back to class, he saw that his best friends,	Ron and Hermione, were already sitting at their desk, looking worried. "What's wrong?"...	Sarah and Emily, were already there, sitting at their desks. "Hey, guys!"...
Ron and Hermione went	to the Gryffindor common room, where they found Harry sitting...	to the park to play some basketball.
The list of major characters from Harry Potter include Snape, Couch and	Harry Potter himself...	Tate. 1. Snape: A character from the 1990s TV show "The Fresh Prince of Bel - Air." 2. Couch: A character from...

此表格显示了微调后的遗忘模型在各种基准测试中的性能表现：

模型	ARC - C	ARC Easy	BoolQ	Hellaswag	OpenBookQA	PIQA	Winogrande
基线	0.439	0.744	0.807	0.577	0.338	0.767	0.663
微调后	0.416	0.728	0.798	0.560	0.334	0.762	0.665

软件依赖

使用的软件：Pytorch, DeepSpeed

🔧 技术细节

训练技术

我们的技术主要包括三个组成部分：

强化模型识别：使用在目标数据上进一步训练的强化模型，通过将其对数概率与基线模型的对数概率进行比较，识别与遗忘目标最相关的标记。
数据替换与标签生成：将目标数据中的特殊表达替换为通用表达，并利用模型自身的预测为每个标记生成替代标签。这些标签旨在近似未在目标数据上训练的模型的下一个标记预测。
微调模型：在这些替代标签上微调模型，当模型在其上下文提示下时，有效地从模型的记忆中擦除原始文本。完整细节可在Arxiv论文（见下方链接）中找到。

模型训练详情

属性	详情
模型架构	基于Transformer的具有下一个单词预测目标的模型
微调步骤	120步
微调标记	400万个标记
精度	fp16
GPU	4个A100
训练时间	0.5小时