Open-source vision-1-mini brand safety classification model, optimized based on Llama 3.1 and compatible with Apple chip devices

Vision 1 Mini

Developed by OverseerAI

An 8-billion-parameter brand safety classification model optimized based on Llama 3.1, specifically designed for Apple chip devices

Text Classification

Transformers

Supports Multiple Languages#Apple chip optimization #Brand safety classification #Multilingual content moderation

Downloads 28

Release Time : 1/14/2025

Model Overview

A lightweight model designed for brand safety classification, providing efficient and accurate brand safety assessments using the BrandSafe-16k classification system

Model Features

Apple chip optimization

Deeply optimized for Metal/MPS, supporting unified memory architecture and efficient layer offloading

Efficient classification system

Utilizes the BrandSafe-16k classification system, covering 17 brand safety-related categories

Lightweight and efficient

4.58GiB quantized model, loading in just 3.27 seconds on Apple M3 Pro

High accuracy

Achieves 95% accuracy in brand safety classification

Model Capabilities

Text classification

Brand safety assessment

Multilingual content analysis

Use Cases

Content moderation

Social media content moderation

Automatically identifies brand safety risks in social media content

95% accuracy

Ad content safety check

Detects elements in ad content that may harm brand image

Brand protection

Competitor mention detection

Identifies mentions of competing brands in content

Brand criticism analysis

Detects negative comments or criticisms about a brand

🚀 vision-1-mini

Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, tailored for brand safety classification. It's optimized for Apple Silicon devices and offers efficient, accurate brand safety assessments via the BrandSafe-16k classification system.

🚀 Quick Start

Vision-1-mini is an outstanding model for brand safety classification. It can quickly and accurately classify text content to ensure brand safety.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini", 
                                           device_map="auto",
                                           torch_dtype=torch.float16,
                                           low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")

# Example usage
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, 
                        max_new_tokens=1,
                        temperature=0.1,
                        top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

✨ Features

Optimized for Apple Silicon: Specifically designed for Apple Silicon devices, leveraging Metal and MPS for efficient inference.
High Accuracy: Achieves a classification accuracy of 0.95 in brand safety classification tasks.
Large Context Window: Supports a context window of 131072, optimized to 2048 for inference.
Quantization: Utilizes a combination of Q4_K and Q6_K quantization for efficient memory usage.

📦 Installation

The installation mainly involves using the transformers library. You can install it via the following command:

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini", 
                                           device_map="auto",
                                           torch_dtype=torch.float16,
                                           low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")

# Example usage
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, 
                        max_new_tokens=1,
                        temperature=0.1,
                        top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

📚 Documentation

Model Details

Property	Details
Model Type	LlamaForCausalLM
Base Model	meta-llama/Llama-2-8b-chat
Parameters	8.03B
Architecture	Llama
Quantization	Q4_K (193 tensors) + Q6_K (33 tensors)
Size	4.58 GiB
License	llama3.1

Performance Metrics

Load Time: 3.27 seconds (on Apple M3 Pro)
Memory Usage:
- CPU Buffer: 4552.80 MiB
- Metal Buffer: 132.50 MiB
- KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V)
- Compute Buffer: 560.00 MiB

Hardware Compatibility

Apple Silicon Optimizations

Optimized for Metal/MPS
Unified Memory Architecture support
SIMD group reduction and matrix multiplication optimizations
Efficient layer offloading (1/33 layers to GPU)

System Requirements

Recommended Memory: 12GB+
GPU: Apple Silicon preferred (M1/M2/M3 series)
Storage: 5GB free space

Classification Categories

The model classifies content into the following categories:

B1-PROFANITY - Contains profane or vulgar language
B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms
B3-COMPETITOR - Mentions or promotes competing brands
B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands
B5-MISLEADING - Contains misleading or deceptive information
B6-POLITICAL - Contains political content or bias
B7-RELIGIOUS - Contains religious content or references
B8-CONTROVERSIAL - Contains controversial topics or discussions
B9-ADULT - Contains adult or mature content
B10-VIOLENCE - Contains violent content or references
B11-SUBSTANCE - Contains references to drugs, alcohol, or substances
B12-HATE - Contains hate speech or discriminatory content
B13-STEREOTYPE - Contains stereotypical representations
B14-BIAS - Shows bias against groups or individuals
B15-UNPROFESSIONAL - Contains unprofessional content or behavior
B16-MANIPULATION - Contains manipulative content or tactics
SAFE - Contains no brand safety concerns

Model Architecture

Attention Mechanism:
- Head Count: 32
- KV Head Count: 8
- Layer Count: 32
- Embedding Length: 4096
- Feed Forward Length: 14336
- Context Length: 2048 (optimized from 131072)
- RoPE Base Frequency: 500000
- Dimension Count: 128

Training & Fine-tuning

This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with:

Temperature: 0.1
Top-p: 0.9
Batch Size: 512
Thread Count: 8

Limitations

The model is optimized for shorter content classification (up to 2048 tokens).
Performance may vary on non-Apple Silicon hardware.
The model focuses solely on brand safety classification and may not be suitable for other tasks.
Classification accuracy may vary based on content complexity and context.

🔧 Technical Details

The model is based on the Llama architecture, specifically LlamaForCausalLM. It uses a combination of Q4_K and Q6_K quantization to reduce memory usage while maintaining high performance. The attention mechanism is optimized with 32 heads and 8 KV heads, allowing for efficient processing of long sequences. The model is fine-tuned on the BrandSafe-16k dataset to achieve high accuracy in brand safety classification.

📄 License

This model is licensed under llama3.1.

📖 Citation

If you use this model in your research, please cite:

@misc{vision-1-mini,
  author = {Max Sonderby},
  title = {Vision-1-Mini: Optimized Brand Safety Classification Model},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/maxsonderby/vision-1-mini}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご