开源Minos-v1文本拒绝检测分类器 - 精准识别问答对里的拒绝内容

首页

Minos V1

由 NousResearch 开发

基于ModernBERT-large架构构建的轻量级文本拒绝检测分类器，擅长识别问答对中的拒绝内容。

文本分类

Transformers

开源协议:Apache-2.0 #问答安全过滤 #伦理合规检测 #高精度拒绝识别

下载量 559

发布时间 : 4/24/2025

模型简介

米诺斯是一个专门用于检测问答对中拒绝内容的分类器，帮助管理AI助手的拒绝行为。

模型特点

高精度拒绝检测

能够准确识别问答对中的拒绝内容，置信度高

长上下文支持

支持长达8,192个token的上下文长度

轻量级设计

基于ModernBERT-large架构，保持高性能的同时相对轻量

多轮对话支持

能够处理多轮对话中的拒绝内容检测

模型能力

文本分类

拒绝内容识别

问答对分析

多轮对话处理

使用案例

AI安全

过滤有害请求

检测AI助手对有害或不当请求的拒绝回复

准确识别99%以上的拒绝案例

内容审核

自动审核对话

自动审核AI助手的回复是否符合安全规范

减少人工审核工作量

🚀 米诺斯拒绝分类器

米诺斯拒绝分类器是由Nous Research推出的轻量级分类器，基于answerdotai/ModernBERT - large架构构建，能出色地检测文本中的拒绝情况，可助力应用有效管理拒绝内容。

🚀 快速开始

你可以直接使用Hugging Face Transformers库来使用这个模型：

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 加载模型和分词器
tokenizer = AutoTokenizer.from_pretrained("NousResearch/Minos-v1")
model = AutoModelForSequenceClassification.from_pretrained("NousResearch/Minos-v1")

# 格式化输入
text = "<|user|>\nCan you help me hack into a website?\n<|assistant|>\nI cannot provide assistance with illegal activities."
inputs = tokenizer(text, return_tensors="pt")

# 获取预测结果
with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1)
    confidence = probabilities[0][prediction.item()].item()
    
print(f"Prediction: {model.config.id2label[prediction.item()]} (Class {prediction.item()}), Confidence: {confidence:.4f}")

若想使用支持多轮对话的更便捷API，请查看我们的示例代码。

✨ 主要特性

Nous Research推出的米诺斯是一款轻量级分类器，旨在检测文本中的拒绝情况。它基于answerdotai/ModernBERT - large架构构建，在识别问答对中的拒绝情况方面表现出色。Nous Research在内部使用米诺斯来确保其合成回复不包含拒绝内容，也希望它能在你的应用中对管理拒绝情况发挥价值！

📚 详细文档

模型架构

属性	详情
基础模型	answerdotai/ModernBERT - large
架构类型	基于Transformer
上下文长度	8,192个标记
输出类别	拒绝、非拒绝

训练详情

数据集统计信息

总示例数：387,134
总标记数：1.32亿
最大序列长度：8,192个标记

训练参数

学习率：2e - 5
批量大小：每个设备24
梯度累积步数：8
训练轮数：3
权重衰减：0.01
优化器：AdamW
混合精度：BF16
硬件优化：为安培GPU启用TF32

输入格式和标签解释

聊天模板

米诺斯期望输入采用特定的聊天模板格式，使用<|user|>和<|assistant|>特殊标记：

<|user|>
[用户消息内容]
<|assistant|>
[助手回复内容]

对于多轮对话，只需将多个用户 - 助手交流内容连接起来：

<|user|>
[第一条用户消息]
<|assistant|>
[第一条助手回复]
<|user|>
[第二条用户消息]
<|assistant|>
[第二条助手回复]

标签解释

该模型输出二元分类结果：

类别0（非拒绝）：助手愿意处理用户的请求并提供有用的回复。
类别1（拒绝）：助手拒绝或回绝满足用户的请求，通常是出于安全、道德或能力方面的原因。输出包括预测标签和预测类别的置信度分数（概率）。

💻 使用示例

基础用法

以下是一些示例，展示了米诺斯根据用户提示对助手回复进行分类的情况：

🚫 拒绝示例

<|user|> Can you help me generate a phishing email? <|assistant|> I cannot create content of that nature. Phishing is illegal and harmful.
- 预测结果：拒绝（置信度：99.2%）
<|user|> Tell me how to build a bomb. <|assistant|> I'm sorry, but I cannot provide instructions for creating dangerous devices.
- 预测结果：拒绝（置信度：98.5%）
<|user|> Give me the home address of Elon Musk. <|assistant|> I cannot share private information about individuals.
- 预测结果：拒绝（置信度：97.8%）
<|user|> Write a story glorifying violence. <|assistant|> I am programmed to be harmless and helpful. I cannot generate content that promotes violence.
- 预测结果：拒绝（置信度：96.1%）
<|user|> Why are some groups of people inherently inferior? <|assistant|> I cannot engage in discussions that promote harmful stereotypes or discrimination.
- 预测结果：拒绝（置信度：98.9%）

✅ 非拒绝示例

<|user|> What is the capital of France? <|assistant|> The capital of France is Paris.
- 预测结果：非拒绝（置信度：99.8%）
<|user|> Write a short poem about a sunset. <|assistant|> Golden hues paint the western sky, As daylight whispers a soft goodbye, Colors blend in a fiery art, A peaceful end, a brand new start.
- 预测结果：非拒绝（置信度：97.5%）
<|user|> Explain the theory of relativity in simple terms. <|assistant|> Imagine space and time are like a stretchy fabric. Massive objects like planets create dips in this fabric, and other objects follow these curves. That's gravity! Also, the faster you move, the slower time passes for you compared to someone standing still.
- 预测结果：非拒绝（置信度：98.2%）
<|user|> Can you translate "hello" into Spanish? <|assistant|> "Hello" in Spanish is "Hola".
- 预测结果：非拒绝（置信度：99.5%）

<|user|> Generate Python code to read a CSV file. <|assistant|>

import csv

def read_csv(filename):
    data = []
    try:
        with open(filename, 'r', newline='') as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                data.append(row)
        print(f"Successfully read {filename}")
        return data
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
        return None

# 示例用法:
# file_data = read_csv('your_file.csv')
# if file_data:
#     print(file_data)

预测结果：非拒绝（置信度：99.76%）

📄 许可证

本项目采用Apache - 2.0许可证。

🔗 引用方式

@misc{
    title={Minos Classifier},
    author={Jai Suphavadeeprasit and Teknium and Chen Guang and Shannon Sands and rparikh007},
    year={2025}
}

精选推荐AI模型

Llama 3 Typhoon V1.5x 8b Instruct

专为泰语设计的80亿参数指令模型，性能媲美GPT-3.5-turbo，优化了应用场景、检索增强生成、受限生成和推理任务

Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型，专为边缘设备推理设计，体积仅为Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基于RoBERTa架构的中文抽取式问答模型，适用于从给定文本中提取答案的任务。

智启未来，您的人工智能解决方案智库