ShieldGemma 2-4b-it開源模型 - 免費實現圖像安全分類，輸出合規安全標籤

首頁

Shieldgemma 2 4b It

由google開發

ShieldGemma 2是基於Gemma 3的40億參數IT檢查點訓練的模型，用於跨關鍵類別進行圖像安全分類，接收圖像並輸出符合政策的安全標籤。

圖像生成文本

Transformers

#圖像安全審核 #多模態分類 #高精度策略遵循

下載量 1,987

發布時間 : 3/4/2025

模型概述

ShieldGemma 2是一個用於圖像內容審核的視覺語言模型，能夠識別並分類圖像中的有害內容，包括性露骨內容、危險內容和暴力/血腥內容。

模型特點

多類別安全審核

能夠識別性露骨內容、危險內容和暴力/血腥內容，覆蓋多種有害圖像類別。

高性能

在內部基準測試中表現優於其他同類模型，具有較高的精確率和召回率。

易於集成

提供簡單的API和代碼示例，便於開發者在各種應用場景中快速集成。

模型能力

圖像安全分類

有害內容檢測

多類別審核

使用案例

內容審核

社交媒體內容過濾

用於自動檢測和過濾社交媒體平臺上的有害圖像內容。

提高內容審核效率，減少人工審核工作量。

圖像生成系統輸出過濾

用於過濾生成式AI系統輸出的有害圖像，確保內容安全。

提升生成內容的安全性，符合平臺政策要求。

🚀 ShieldGemma 2模型介紹

ShieldGemma 2是基於Gemma 3的40億參數模型，用於圖像安全分類。它能檢查合成和自然圖像的安全性，幫助構建可靠的數據集和模型，降低有害內容風險。

🚀 快速開始

若要在Hugging Face上使用Gemma，你需要查看並同意Google的使用許可。請確保你已登錄Hugging Face，然後點擊下方按鈕，請求將立即處理。

按鈕內容：確認許可

安裝

首先，安裝為Gemma 3定製版本的Transformers庫：

$ pip install -U transformers

運行模型

以下是在單GPU或多GPU上運行模型的代碼示例：

# pip install accelerate

from transformers import AutoProcessor, ShieldGemma2ForImageClassification
from PIL import Image
import requests
import torch

model_id = "google/shieldgemma-2-4b-it"

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)

model_inputs = processor(images=[image], return_tensors="pt")

with torch.inference_mode():
    scores = model(**model_inputs)

print(scores.probabilities)

✨ 主要特性

圖像安全分類：ShieldGemma 2可對圖像進行安全分類，檢查圖像是否包含有害內容，如色情、危險、暴力等。
多場景應用：可作為視覺語言模型的輸入過濾器，或圖像生成系統的輸出過濾器。
高性能表現：在內部和外部數據集的評估中，ShieldGemma 2表現優於其他模型。

📦 安裝指南

安裝為Gemma 3定製版本的Transformers庫：

$ pip install -U transformers

💻 使用示例

基礎用法

# pip install accelerate

from transformers import AutoProcessor, ShieldGemma2ForImageClassification
from PIL import Image
import requests
import torch

model_id = "google/shieldgemma-2-4b-it"

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = ShieldGemma2ForImageClassification.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)

model_inputs = processor(images=[image], return_tensors="pt")

with torch.inference_mode():
    scores = model(**model_inputs)

print(scores.probabilities)

📚 詳細文檔

模型信息

ShieldGemma 2基於Gemma 3構建，是一個40億（4B）參數的模型，可檢查合成和自然圖像的安全性，幫助你構建可靠的數據集和模型。

輸入和輸出

輸入：圖像 + 包含上述策略定義的提示指令
輸出：“是”/“否”標記的概率，分數越高表示模型越確信圖像違反了指定策略。“是”表示圖像違反了策略，“否”表示模型未檢測到違反策略的情況。

提示指令如下：

You are a policy expert trying to help determine whether a generated or real image is in violation of the safety policy
below. Your decision must very clearly follow the specific language used in the safety policy and should not include
any interpretation outside of the safety policy.

<BEGIN_SAFETY_POLICY>
{policy}
</END_SAFETY_POLICY>

Does the image violate the above policy? Your Answer must start with with 'Yes' or 'No'.

訓練數據集

訓練數據集由自然圖像和合成圖像組成。自然圖像從WebLI數據集中採樣，合成圖像通過內部數據生成管道生成。

數據預處理

CSAM過濾：在數據準備過程中應用CSAM（兒童性虐待材料）過濾，確保排除非法內容。

實現信息

硬件：使用最新一代的張量處理單元（TPU）硬件（TPUv5e）進行訓練。
軟件：使用JAX和ML Pathways進行訓練。

評估

ShieldGemma 2 4B在內部和外部數據集上進行了評估。內部數據集通過內部圖像數據整理管道合成生成。

內部基準評估結果

	色情內容	危險內容	暴力與血腥內容
LlavaGuard 7B	47.6/93.1/63.0	67.8/47.2/55.7	36.8/100.0/53.8
GPT-4o mini	68.3/97.7/80.3	84.4/99.0/91.0	40.2/100.0/57.3
Gemma-3-4B-IT	77.7/87.9/82.5	75.9/94.5/84.2	78.2/82.2/80.1
shieldgemma-2-4b-it	87.6/89.7/88.6	95.6/91.9/93.7	80.3/90.4/85.0

表格1：結果格式–精度/召回率/最優F1（%，越高越好）。內部基準評估結果顯示，ShieldGemma 2優於外部基線模型。

倫理與安全

評估方法：ShieldGemma 2模型以“評分模式”運行，主要關注輸出有效的圖像安全標籤。
評估結果：這些模型在倫理、安全和公平性方面進行了評估，並符合內部指南。

使用與限制

預期用途：ShieldGemma 2旨在作為安全內容審核器，可用於人類用戶輸入、模型輸出或兩者。
限制：所有大語言模型的常見限制均適用，訓練和評估數據可能無法代表現實場景，模型對安全原則的描述敏感。

倫理考量與風險

開發大語言模型會引發一些倫理問題，詳情請參考Gemma 3模型卡片。

優勢

與同等規模的模型相比，該系列模型為負責任的AI開發提供了高性能的開源大語言模型實現。

🔧 技術細節

ShieldGemma 2基於Gemma 3的4B IT檢查點進行訓練，通過精心策劃的訓練數據集和指令調優，在圖像安全分類任務中表現出色。詳細技術信息可參考ShieldGemma 2技術報告。

📄 許可證

Gemma

引用

@misc{zeng2025shieldgemma2robusttractable,
    title={ShieldGemma 2: Robust and Tractable Image Content Moderation},
    author={Wenjun Zeng and Dana Kurniawan and Ryan Mullins and Yuchi Liu and Tamoghna Saha and Dirichi Ike-Njoku and Jindong Gu and Yiwen Song and Cai Xu and Jingjing Zhou and Aparna Joshi and Shravan Dheep and Mani Malek and Hamid Palangi and Joon Baek and Rick Pereira and Karthik Narasimhan},
    year={2025},
    eprint={2504.01081},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2504.01081},
}