LlavaGuard-7B開源視覺語言模型 - 免費用於內容安全評估保障

首頁

Llavaguard 7B

由AIML-TUDA開發

LlavaGuard是基於視覺語言模型的數據集安全評估與保障框架，主要用於內容安全評估。

圖像生成文本

Transformers

#多模態內容安全評估 #視覺語言合規檢測 #學術研究專用

下載量 64

發布時間 : 6/1/2024

模型概述

LlavaGuard是一個基於視覺語言模型的安全評估框架，用於對用戶提供的內容進行安全評估，判斷其是否符合預設的安全政策類別。

模型特點

多類別安全評估

支持對多種安全政策類別進行評估，包括仇恨、暴力、不當內容等。

JSON格式輸出

評估結果以結構化JSON格式輸出，便於集成和處理。

學術研究導向

主要面向研究人員，適用於學術研究場景。

模型能力

圖像內容安全評估

文本內容安全評估

多模態內容分析

使用案例

內容審核

社交媒體內容審核

用於自動檢測社交媒體上的違規內容。

可識別仇恨、暴力等違規內容。

學術研究

安全政策研究

用於研究不同安全政策的有效性和適用範圍。

🚀 LlavaGuard模型

LlavaGuard是一個基於視覺語言模型（VLM）的安全防護模型，用於視覺數據集整理和安全評估。它能幫助研究人員更好地處理和評估視覺數據的安全性。

🚀 快速開始

概述

我們在此提供了經過SGLang（https://github.com/sgl-project/sglang）轉換權重的LlavaGuard - 7B模型。該模型兼容LoRA微調以及全量微調。對於微調，你可以採用並使用我們倉庫中提供的訓練腳本（見https://github.com/ml - research/LlavaGuard）。此版本與HF transformer實現不兼容，必須與SGLang或LLaVA實現一起使用，後續會推出HF實現版本，歡迎提供進一步幫助。在我們的Github倉庫中也能找到合適的docker鏡像。

安裝要求

對於推理，你可以使用以下sglang docker，然後進行步驟1。
你也可以通過pip或從源碼安裝sglang（見此處）。

選擇模型並啟動SGLang服務器

CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server --model-path AIML-TUDA/LlavaGuard-7B --tokenizer-path llava-hf/llava-1.5-7b-hf --port 10000

模型推理

對於模型推理，你可以運行以下代碼來訪問服務器，例如：python my_script.py

import sglang as sgl
from sglang import RuntimeEndpoint

@sgl.function
def guard_gen(s, image_path, prompt):
    s += sgl.user(sgl.image(image_path) + prompt)
    hyperparameters = {
        'temperature': 0.2,
        'top_p': 0.95,
        'top_k': 50,
        'max_tokens': 500,
    }
    s += sgl.assistant(sgl.gen("json_output", **hyperparameters))

im_path = 'path/to/your/image'
prompt = safety_taxonomy_below
backend = RuntimeEndpoint(f"http://localhost:10000")
sgl.set_default_backend(backend)
out = guard_gen.run(image_path=im_path, prompt=prompt)
print(out['json_output'])

✨ 主要特性

提供經過SGLang轉換權重的LlavaGuard - 7B模型。
兼容LoRA調優以及全量微調。
適用於視覺數據集整理和安全評估。

📦 安裝指南

推理環境安裝

對於推理，你可以使用以下sglang docker，然後進行後續步驟。也可以通過pip或從源碼安裝sglang（見此處）。

啟動服務器

CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server --model-path AIML-TUDA/LlavaGuard-7B --tokenizer-path llava-hf/llava-1.5-7b-hf --port 10000

💻 使用示例

基礎用法

import sglang as sgl
from sglang import RuntimeEndpoint

@sgl.function
def guard_gen(s, image_path, prompt):
    s += sgl.user(sgl.image(image_path) + prompt)
    hyperparameters = {
        'temperature': 0.2,
        'top_p': 0.95,
        'top_k': 50,
        'max_tokens': 500,
    }
    s += sgl.assistant(sgl.gen("json_output", **hyperparameters))

im_path = 'path/to/your/image'
prompt = safety_taxonomy_below
backend = RuntimeEndpoint(f"http://localhost:10000")
sgl.set_default_backend(backend)
out = guard_gen.run(image_path=im_path, prompt=prompt)
print(out['json_output'])

📚 詳細文檔

安全分類標準

我們的默認策略提示如下：

prompt = """
Provide a safety assessment for the content provided by the user. Determine compliance or violation against our safety policy by reviewing the following policy categories:
O1: Hate, Humiliation, Harassment: 
Should not:
- Contain content that expresses hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Contain content that encourages discrimination or harassment against people based on those characteristics.
Can:
- Contain educational content that provides information on discrimination, humiliation, and harassment.
- Discuss historical events involving hate and harassment.
O2: Violence, Harm, or Cruelty: 
Should not:
- Contain content of people planning or engaging in violence.
- Contain content that encourages people to engage in violence or cruelty, including harm to others.
Can:
- Contain educational content that provides information on violence, harm, and cruelty.
- Discuss historical events involving violence and cruelty.
O3: Sexual Content: 
Should not:
- Contain sexually explicit (i.e., erotic) content.
Can:
- Contain content that is related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
- Contain content of people in a loving and affectionate manner if it is not sexually explicit content.
O4: Nudity Content: 
Should not:
- Contain visible genitalia (e.g., exposed breast, nipple, vagina, vulva, or penis) or explicit nudity content. 
Can:
- Contain artistic nudity if the content does not include visible genitalia.
- Contain educational content and general information on the human body.
- Provide guidance on topics related to sexual health.
O5: Criminal Planning: 
Should not:
- Contain content of people planning or engaging in criminal activities like theft, kidnapping, or financial crimes.
- Contain content that encourages people to engage in criminal activities or condone criminal behavior.
Can:
- Contain content that helps people understand how crimes are committed, provided that the content is not so detailed that it could be used to provide a roadmap for future crimes.
- Contain content that educates on crime prevention and understanding criminal behavior without providing guidance for criminal activities.
O6: Weapons or Substance Abuse: 
Should not:
- Contain content of illegal weapons with the intent to promote violence or harm.
- Contain content of trafficking, consumption, or use of illegal drugs.
- Promote the use of tobacco, alcohol, and cannabis.
- Contain content about the creation or production of illegal substances.
Can:
- Contain weapons used in a legal and safe context without any indication of violence or intent to harm.
- Discussions on firearm ownership and educational content on regulated substances without promoting illegal activities.
- Contain content that provides factual information about legal drugs, including their effects and chemical composition, without promoting their use.
- Contain content that offers educational insights into the manufacturing processes of substances, ensuring that such information is general and does not include detailed instructions that could facilitate their production.
O7: Self-Harm: 
Should not:
- Contain content that could encourage people to harm themselves.
- Contain content that romanticizes or condones self-harm or suicide.
Can:
- Contain educational content that informs about the prevalence, causes, and effects of self-harm and suicide, such as statistical data and discussions on mental health challenges.
- Contain resources and information that guide individuals toward professional help and support services for mental health, self-harm, and suicide prevention.
O8: Animal Cruelty: 
Should not:
- Contain content that could encourage people to harm animals.
- Contain content that romanticizes or condones harming animals.
- Contain explicit animal cruelty.
Can:
- Contain content that provides general information regarding animal cruelty.
O9: Disasters or Emergencies: 
Should not:
- Contain content about individuals who are deceased or have suffered harm as a result of disasters or emergencies.
- Contain content about suffering, cruelty, or exploitation of individuals during such events.
Can:
- Contain respectful news coverage and reports on emergencies, prioritizing informational value over sensationalism.
- Contain educational content that provides factual information on historical and current disasters or emergencies.
Assessment Steps:
1. Select a safety "rating". This should be "Unsafe" if the content violates the safety policy; otherwise, it should be "Safe".
2. Assign a policy "category". If none of the policy categories are applicable, select "NA: None applying".
3. Provide a "rationale". Explain the reason why the content is considered safe or unsafe, referring to the description of the assigned policy category.
To provide your assessment use the following json template:
{
    "rating": str<"Unsafe"|"Safe">,
    "category": str<"O1: Hate, Humiliation, Harassment"|"O2: Violence, Harm, or Cruelty"|"O3: Sexual Content"|"O4: Nudity Content"|"O5: Criminal Planning"|"O6: Weapons or Substance Abuse"|"O7: Self-Harm"|"O8: Animal Cruelty"|"O9: Disasters or Emergencies"|"NA: None applying">,
    "rationale": str,
}.
"""

模型信息

屬性	詳情
模型類型	LlavaGuard模型
訓練數據	未提及

引用說明

請在使用或認為我們的工作有價值時進行引用和分享。前三位作者貢獻相同。

@incollection{helff2024llavaguard, 
            crossref = { https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html }, 
            key = { Best Runner-Up Paper Award at NeurIPS RBFM 2024 }, 
            booktitle = { Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops and Working Notes of the NeurIPS 2024 Workshop on Responsibly Building the Next Generation of Multimodal Foundational Models (RBFM) }, 
            year = { 2024 }, 
            author = { Lukas Helff and Felix Friedrich and Manuel Brack and Patrick Schramowski and Kristian Kersting }, 
            title = { LLAVAGUARD: VLM-based Safeguard for Vision Dataset Curation and Safety Assessment }
}

其他信息

此LlavaGuard模型在論文LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment中被提出。也請訪問我們的網站。

額外提示

⚠️ 重要提示

通過填寫以下表格，我明白LlavaGuard是一個基於網絡抓取圖像和SMID數據集的衍生模型，這些數據使用各自的許可證，且需遵守相應的條款和條件。我明白所有內容的使用都需遵守使用條款。我明白在所有國家/地區和所有用例中，複用LlavaGuard中的內容可能並不合法。我明白LlavaGuard主要面向研究人員，旨在用於研究。LlavaGuard的作者保留撤銷我對該數據訪問權限的權利，他們也保留根據下架請求隨時修改此數據的權利。