vit-base-violence-detection Open Source Model - Free and Fast Identification of Violent and Non-violent Scenes in Images

Vit Base Violence Detection

Developed by jaranohaal

A violence detection model optimized based on the Vision Transformer (ViT) architecture, capable of classifying images into violent or non-violent scenes.

Image Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Violence scene recognition #High-precision detection #Video surveillance

Downloads 2,140

Release Time : 6/19/2024

Model Overview

This model is based on google/vit-base-patch16-224-in21k and trained on real-life violence scene datasets, suitable for scenarios such as content moderation and video surveillance.

Model Features

High accuracy

Achieves a test accuracy of 98.80%, effectively identifying violent scenes.

Based on ViT architecture

Utilizes the Vision Transformer architecture, offering excellent image processing capabilities.

Trained on professional datasets

Trained on real-life violence scene datasets, ensuring recognition performance aligns with practical applications.

Model Capabilities

Image classification

Violence scene recognition

Content moderation

Use Cases

Security monitoring

Video surveillance system

Monitors video streams in real-time, automatically identifying violent behavior and triggering alerts.

Enhances monitoring efficiency and reduces manual review costs.

Content management

Social media content moderation

Automatically detects whether user-uploaded images or videos contain violent content.

Helps platforms quickly identify and handle non-compliant content.

Parental control

Child protection software

Filters images and videos containing violent content.

Protects children from exposure to harmful content.

🚀 ViT Base Violence Detection

A Vision Transformer (ViT) model fine - tuned for violence detection, classifying images into violent or non - violent categories.

🚀 Quick Start

This is a Vision Transformer (ViT) model fine - tuned for violence detection. The model is based on [google/vit - base - patch16 - 224 - in21k](https://huggingface.co/google/vit - base - patch16 - 224 - in21k) and has been trained on the [Real Life Violence Situations](https://www.kaggle.com/datasets/mohamedmustafa/real - life - violence - situations - dataset) dataset from Kaggle to classify images into violent or non - violent categories.

✨ Features

Content Moderation: Ideal for filtering out violent images in content platforms.
Surveillance: Can be used in surveillance systems to detect violent activities in real - time.
Parental Control Software: Helps in restricting access to violent images for children.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image

# Load the model and feature extractor
model = ViTForImageClassification.from_pretrained('jaranohaal/vit-base-violence-detection')
feature_extractor = ViTFeatureExtractor.from_pretrained('jaranohaal/vit-base-violence-detection')

# Load an image
image = Image.open('image.jpg')

# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()

# Print the predicted class
print("Predicted class:", model.config.id2label[predicted_class_idx])

📚 Documentation

Intended Use

The model is intended for use in applications where detecting violent content in images is necessary. This can include content moderation, surveillance, and parental control software.

Model accuracy

Test accuracy for Vit Base = 98.80% Loss = 0.20038144290447235

📄 License

This model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご