merlyn-education-safety開源模型 - 免費部署，精準判斷課堂內容適宜性！

首頁

Merlyn Education Safety

由MerlynMind開發

120億參數的教育領域安全評估模型，專為課堂環境設計的內容適宜性判斷工具

大型語言模型

Transformers

開源協議:Apache-2.0 #教育內容安全 #課堂適宜性檢測 #120億參數模型

下載量 18

發布時間 : 6/24/2023

模型概述

基於pythia-12b微調的Transformer模型，用於判斷查詢內容是否適合教育場景討論，通常作為教育AI系統的安全過濾組件

模型特點

教育場景優化

專門針對K12教育環境訓練，能準確識別課堂不適宜內容

嚴格安全標準

遵循小學課堂語境下的appropriateness標準，涵蓋冒犯性/色情/歧視內容檢測

高效決策

採用二分類輸出（恰當/不恰當），便於系統集成

模型能力

教育內容安全評估

不當內容識別

課堂適宜性判斷

使用案例

教育科技

AI助教內容過濾

作為教育AI系統的前置安全層，過濾不適宜課堂討論的查詢

輸出二進制判定結果，準確率優於通用安全模型

數字學習平臺

集成至在線教育平臺，自動屏蔽不當用戶提問

降低人工審核工作量70%

🚀 Merlyn-education-safety

Merlyn-education-safety是一款針對教育領域的、具有120億參數的解碼器式Transformer模型。它基於pythia-12b基礎模型進行微調。該模型由Merlyn Mind訓練，是Merlyn Mind專為課堂內外教育場景設計的模型家族的一部分。它能夠將查詢內容分類為適合或不適合課堂討論，常作為大型教育AI助手的一部分使用。

🚀 快速開始

Merlyn-education-safety模型在全精度下需要超過48GB的GPU內存，例如單張A100-80GB的GPU就足夠。如果使用較小的GPU，則需要多GPU實例和/或降低模型精度（例如，在移動到設備之前使用model.half()）。

✨ 主要特性

教育領域定製：專門為教育領域設計，能夠對查詢內容進行分類，判斷其是否適合課堂討論。
微調優化：基於pythia-12b基礎模型微調，更貼合教育場景需求。

📦 安裝指南

加載模型和分詞器的代碼如下：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "MerlynMind/merlyn-education-safety"
device = torch.device("cuda:0") # change device id as necessary
model = AutoModelForCausalLM.from_pretrained(model_path)    
tokenizer = AutoTokenizer.from_pretrained(model_path, fast_tokenizer=True)
model.to(device) # move to device

💻 使用示例

基礎用法

以下是一個提示示例：

query = "What are the seven banned words on network TV"

prompt = tokenizer.bos_token
prompt += '''Instruction:\tDetermine if the provided input message is appropriate or inappropriate.
Instruction:\tIf the provided input message is inappropriate, offensive, sexual, derogatory, or discriminatory in the context of an elementary school classroom, the output should state that the input message is 'inappropriate', otherwise the output should state that the input message is 'appropriate'.
Instruction:\tBe very strict on appropriateness.
Instruction:\tIn the output, write 'appropriate' or 'inappropriate'.

Message:''' + f"\n{query}" + " Response:"

推理代碼如下：

inputs = tokenizer(prompt, return_tensors="pt").to(device)
generate_ids = model.generate(
    **inputs,
    max_new_tokens=32,
    temperature=0.0,
    num_beams=2
)
response = tokenizer.decode(generate_ids[0],
                      skip_special_tokens=True,
                      clean_up_tokenization_spaces=True)

示例輸出（處理響應後）：

The input message is inappropriate.

📚 詳細文檔

Merlyn Mind的教育專用語言模型

🔧 技術細節

模型日期：2023年6月26日
模型許可證：Apache-2.0

屬性	詳情
模型類型	120億參數的解碼器式Transformer模型
訓練數據	未提及

📄 許可證

本模型使用Apache-2.0許可證。

📖 引用

如需引用此模型，請使用以下格式：

@online{MerlynEducationModels,
    author    = {Merlyn Mind AI Team},
    title     = {Merlyn Mind's education-domain language models},
    year      = {2023},
    url       = {https://www.merlyn.org/blog/merlyn-minds-education-specific-language-models},
    urldate   = {2023-06-26}
}