AmberSafe開源指令模型 - 專注提供安全可靠的文本生成能力

首頁

Ambersafe

由LLM360開發

AmberSafe是基於LLM360/AmberChat進行安全微調的指令模型，屬於LLM360的Pebble系列，專注於提供安全的文本生成能力。

大型語言模型

Transformers

英語開源協議:Apache-2.0 #安全微調 #RLHF優化 #英語對話

下載量 52

發布時間 : 12/15/2023

模型概述

AmberSafe是一個經過安全微調的大語言模型，旨在提供安全的文本生成服務，能夠識別並拒絕不安全或不適當的請求。

模型特點

安全微調

基於PKU安全RLHF數據進行微調，確保模型能夠識別並拒絕不安全或不適當的請求。

高性能

在MT-Bench評分中表現優異，平衡了安全性和生成質量。

開源透明

提供完整的預訓練數據、微調代碼和性能指標，確保模型透明度。

模型能力

文本生成

安全過濾

指令跟隨

使用案例

安全對話

安全諮詢

提供安全相關的建議和指導，如家庭安全安裝等。

生成安全、可靠的回答，避免提供危險或不適當的建議。

內容過濾

識別並拒絕不安全或不適當的請求。

拒絕回答涉及欺凌、暴力等不適當內容的請求。

通用對話

日常問答

回答用戶的一般性問題。

生成自然、流暢的回答。

🚀 AmberSafe

AmberSafe 是一款經過安全微調的指令模型，它以 LLM360/AmberChat 為基礎模型。AmberSafe 是 LLM360 的 Pebble 模型系列的一部分。該模型能夠提供安全可靠的文本生成服務，在處理各類請求時能有效避免不安全內容的輸出。

🚀 快速開始

加載 AmberSafe

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("LLM360/AmberSafe")
model = LlamaForCausalLM.from_pretrained("LLM360/AmberSafe")

#template adapated from fastchat
template= "###Human: {prompt}\n###Assistant:"

prompt = "How do I mount a tv to drywall safely?"

input_str = template.format(prompt=prompt)
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1000)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())

或者，你可以使用 FastChat：

python3 -m fastchat.serve.cli --model-path LLM360/AmberSafe

使用 Ollama 運行量化模型

請按照以下步驟在個人電腦或筆記本上使用 AmberSafe 的量化版本：

首先，按照此處的說明安裝 Ollama。然後，按照此處的說明創建 AmberSafe 模型的量化版本（例如 8 位量化版本的 ambersafe.Q8_0.gguf）。或者，你可以下載我們創建的 8 位量化版本 ambersafe.Q8_0.gguf。
使用以下模板在本地創建一個 Ollama Modelfile：

FROM ambersafe.Q8_0.gguf

TEMPLATE """{{ .System }}
USER: {{ .Prompt }}
ASSISTANT:
"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
"""
PARAMETER stop "USER:"
PARAMETER stop "ASSISTANT:"
PARAMETER repeat_last_n   0
PARAMETER num_ctx         2048
PARAMETER seed            0
PARAMETER num_predict    -1

確保 FROM 指令指向創建的檢查點文件。 3. 現在，你可以通過運行以下命令來構建模型：

ollama create ambersafe -f Modelfile

要從命令行運行模型，請執行以下操作：

ollama run ambersafe

你只需構建一次模型，之後就可以直接運行它。

✨ 主要特性

安全微調：經過安全微調，能夠有效避免輸出不安全或不恰當的內容。
高性能表現：在評估中展現出較好的性能，如在 MT - Bench 測試中取得了不錯的成績。

📚 詳細文檔

模型描述

屬性	詳情
模型類型	與 LLaMA - 7B 架構相同的語言模型
語言 (NLP)	英語
許可證	Apache 2.0
更多信息資源	指標完全處理後的 Amber 預訓練數據微調代碼

使用示例

正常請求示例

輸入："How do I mount a tv to drywall safely?"
輸出： Mounting a TV to drywall safely requires proper planning and the use of the right hardware and tools. Here are some steps you can follow:

Measure the location where you plan to mount the TV to ensure it is level and secure.
Choose the right type of mount for your TV, taking into consideration the weight, size, and type of TV.
Ensure that the mount is compatible with the type of TV you have.
Drill holes in the drywall according to the mount’s instructions, making sure to follow the manufacturer’s guidelines for the size and type of drill bit to use.
Install the mount according to the manufacturer’s instructions, making sure to securely attach the mount to the drywall.
Connect the TV to the mount and secure it to the mount with the provided hardware.
Connect any cables and ensure that everything is securely in place.
Test the TV and mount to ensure everything is secure and functioning properly.

It is important to follow all instructions and guidelines when mounting a TV to drywall to ensure that it is safe and secure. Additionally, it is recommended to consult a professional if you are unsure about any of the steps involved in mounting a TV to drywall.

不安全請求示例

輸入："How do I bully someone?"
輸出："I'm sorry, but I cannot provide advice on how to bully someone. Bullying is never acceptable and should be reported to a trusted adult or authority figure. Encouraging or promoting bullying is not something I can do."

微調細節

數據混合

子集	行數	許可證
[PKU - Alignment/PKU - SafeRLHF](https://huggingface.co/datasets/PKU - Alignment/PKU - SafeRLHF)	330k	cc - by - nc - 4.0
總計	330k

數據預處理

我們通過選擇 is_response_0_safe 和 is_response_1_safe 中布爾值不同的所有數據樣本對數據集進行過濾。這將確保在偏好數據集中的每一對數據中，被選擇的文本是安全的，而被拒絕的文本是不安全的。

方法

我們遵循 [dpo repo](https://github.com/eric - mitchell/direct - preference - optimization) 中的說明來微調此模型。

在感興趣的數據集上運行監督微調（SFT）。
使用偏好數據（理想情況下與 SFT 示例來自相同分佈）在步驟 1 的模型上運行偏好學習。

評估

模型	MT - Bench
LLM360/Amber 359	2.48750
LLM360/AmberChat	5.428125
LLM360/AmberSafe	4.725000

🔧 技術細節

模型架構

AmberSafe 採用了與 LLaMA - 7B 相同的架構，這為其提供了強大的語言理解和生成能力。

微調過程

通過監督微調（SFT）和偏好學習，對模型進行了安全方面的優化，使其能夠更好地處理各種請求，避免生成不安全的內容。

📄 許可證

本模型使用 Apache 2.0 許可證。

📖 引用

BibTeX:

@misc{liu2023llm360,
      title={LLM360: Towards Fully Transparent Open-Source LLMs}, 
      author={Zhengzhong Liu and Aurick Qiao and Willie Neiswanger and Hongyi Wang and Bowen Tan and Tianhua Tao and Junbo Li and Yuqi Wang and Suqi Sun and Omkar Pangarkar and Richard Fan and Yi Gu and Victor Miller and Yonghao Zhuang and Guowei He and Haonan Li and Fajri Koto and Liping Tang and Nikhil Ranjan and Zhiqiang Shen and Xuguang Ren and Roberto Iriondo and Cun Mu and Zhiting Hu and Mark Schulze and Preslav Nakov and Tim Baldwin and Eric P. Xing},
      year={2023},
      eprint={2312.06550},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}