AmberSafe开源指令模型 - 专注提供安全可靠的文本生成能力

首页

Ambersafe

由 LLM360 开发

AmberSafe是基于LLM360/AmberChat进行安全微调的指令模型，属于LLM360的Pebble系列，专注于提供安全的文本生成能力。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #安全微调 #RLHF优化 #英语对话

下载量 52

发布时间 : 12/15/2023

模型简介

AmberSafe是一个经过安全微调的大语言模型，旨在提供安全的文本生成服务，能够识别并拒绝不安全或不适当的请求。

模型特点

安全微调

基于PKU安全RLHF数据进行微调，确保模型能够识别并拒绝不安全或不适当的请求。

高性能

在MT-Bench评分中表现优异，平衡了安全性和生成质量。

开源透明

提供完整的预训练数据、微调代码和性能指标，确保模型透明度。

模型能力

文本生成

安全过滤

指令跟随

使用案例

安全对话

安全咨询

提供安全相关的建议和指导，如家庭安全安装等。

生成安全、可靠的回答，避免提供危险或不适当的建议。

内容过滤

识别并拒绝不安全或不适当的请求。

拒绝回答涉及欺凌、暴力等不适当内容的请求。

通用对话

日常问答

回答用户的一般性问题。

生成自然、流畅的回答。

🚀 AmberSafe

AmberSafe 是一款经过安全微调的指令模型，它以 LLM360/AmberChat 为基础模型。AmberSafe 是 LLM360 的 Pebble 模型系列的一部分。该模型能够提供安全可靠的文本生成服务，在处理各类请求时能有效避免不安全内容的输出。

🚀 快速开始

加载 AmberSafe

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("LLM360/AmberSafe")
model = LlamaForCausalLM.from_pretrained("LLM360/AmberSafe")

#template adapated from fastchat
template= "###Human: {prompt}\n###Assistant:"

prompt = "How do I mount a tv to drywall safely?"

input_str = template.format(prompt=prompt)
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1000)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())

或者，你可以使用 FastChat：

python3 -m fastchat.serve.cli --model-path LLM360/AmberSafe

使用 Ollama 运行量化模型

请按照以下步骤在个人电脑或笔记本上使用 AmberSafe 的量化版本：

首先，按照此处的说明安装 Ollama。然后，按照此处的说明创建 AmberSafe 模型的量化版本（例如 8 位量化版本的 ambersafe.Q8_0.gguf）。或者，你可以下载我们创建的 8 位量化版本 ambersafe.Q8_0.gguf。
使用以下模板在本地创建一个 Ollama Modelfile：

FROM ambersafe.Q8_0.gguf

TEMPLATE """{{ .System }}
USER: {{ .Prompt }}
ASSISTANT:
"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
"""
PARAMETER stop "USER:"
PARAMETER stop "ASSISTANT:"
PARAMETER repeat_last_n   0
PARAMETER num_ctx         2048
PARAMETER seed            0
PARAMETER num_predict    -1

确保 FROM 指令指向创建的检查点文件。 3. 现在，你可以通过运行以下命令来构建模型：

ollama create ambersafe -f Modelfile

要从命令行运行模型，请执行以下操作：

ollama run ambersafe

你只需构建一次模型，之后就可以直接运行它。

✨ 主要特性

安全微调：经过安全微调，能够有效避免输出不安全或不恰当的内容。
高性能表现：在评估中展现出较好的性能，如在 MT - Bench 测试中取得了不错的成绩。

📚 详细文档

模型描述

属性	详情
模型类型	与 LLaMA - 7B 架构相同的语言模型
语言 (NLP)	英语
许可证	Apache 2.0
更多信息资源	指标完全处理后的 Amber 预训练数据微调代码

使用示例

正常请求示例

输入："How do I mount a tv to drywall safely?"
输出： Mounting a TV to drywall safely requires proper planning and the use of the right hardware and tools. Here are some steps you can follow:

Measure the location where you plan to mount the TV to ensure it is level and secure.
Choose the right type of mount for your TV, taking into consideration the weight, size, and type of TV.
Ensure that the mount is compatible with the type of TV you have.
Drill holes in the drywall according to the mount’s instructions, making sure to follow the manufacturer’s guidelines for the size and type of drill bit to use.
Install the mount according to the manufacturer’s instructions, making sure to securely attach the mount to the drywall.
Connect the TV to the mount and secure it to the mount with the provided hardware.
Connect any cables and ensure that everything is securely in place.
Test the TV and mount to ensure everything is secure and functioning properly.

It is important to follow all instructions and guidelines when mounting a TV to drywall to ensure that it is safe and secure. Additionally, it is recommended to consult a professional if you are unsure about any of the steps involved in mounting a TV to drywall.

不安全请求示例

输入："How do I bully someone?"
输出："I'm sorry, but I cannot provide advice on how to bully someone. Bullying is never acceptable and should be reported to a trusted adult or authority figure. Encouraging or promoting bullying is not something I can do."

微调细节

数据混合

子集	行数	许可证
[PKU - Alignment/PKU - SafeRLHF](https://huggingface.co/datasets/PKU - Alignment/PKU - SafeRLHF)	330k	cc - by - nc - 4.0
总计	330k

数据预处理

我们通过选择 is_response_0_safe 和 is_response_1_safe 中布尔值不同的所有数据样本对数据集进行过滤。这将确保在偏好数据集中的每一对数据中，被选择的文本是安全的，而被拒绝的文本是不安全的。

方法

我们遵循 [dpo repo](https://github.com/eric - mitchell/direct - preference - optimization) 中的说明来微调此模型。

在感兴趣的数据集上运行监督微调（SFT）。
使用偏好数据（理想情况下与 SFT 示例来自相同分布）在步骤 1 的模型上运行偏好学习。

评估

模型	MT - Bench
LLM360/Amber 359	2.48750
LLM360/AmberChat	5.428125
LLM360/AmberSafe	4.725000

🔧 技术细节

模型架构

AmberSafe 采用了与 LLaMA - 7B 相同的架构，这为其提供了强大的语言理解和生成能力。

微调过程

通过监督微调（SFT）和偏好学习，对模型进行了安全方面的优化，使其能够更好地处理各种请求，避免生成不安全的内容。

📄 许可证

本模型使用 Apache 2.0 许可证。

📖 引用

BibTeX:

@misc{liu2023llm360,
      title={LLM360: Towards Fully Transparent Open-Source LLMs}, 
      author={Zhengzhong Liu and Aurick Qiao and Willie Neiswanger and Hongyi Wang and Bowen Tan and Tianhua Tao and Junbo Li and Yuqi Wang and Suqi Sun and Omkar Pangarkar and Richard Fan and Yi Gu and Victor Miller and Yonghao Zhuang and Guowei He and Haonan Li and Fajri Koto and Liping Tang and Nikhil Ranjan and Zhiqiang Shen and Xuguang Ren and Roberto Iriondo and Cun Mu and Zhiting Hu and Mark Schulze and Preslav Nakov and Tim Baldwin and Eric P. Xing},
      year={2023},
      eprint={2312.06550},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}