đ vision-1-mini
Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, tailored for brand safety classification. It's optimized for Apple Silicon devices and offers efficient, accurate brand safety assessments via the BrandSafe-16k classification system.
đ Quick Start
Vision-1-mini is an outstanding model for brand safety classification. It can quickly and accurately classify text content to ensure brand safety.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini",
device_map="auto",
torch_dtype=torch.float16,
low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs,
max_new_tokens=1,
temperature=0.1,
top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
⨠Features
- Optimized for Apple Silicon: Specifically designed for Apple Silicon devices, leveraging Metal and MPS for efficient inference.
- High Accuracy: Achieves a classification accuracy of 0.95 in brand safety classification tasks.
- Large Context Window: Supports a context window of 131072, optimized to 2048 for inference.
- Quantization: Utilizes a combination of Q4_K and Q6_K quantization for efficient memory usage.
đĻ Installation
The installation mainly involves using the transformers
library. You can install it via the following command:
pip install transformers
đģ Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini",
device_map="auto",
torch_dtype=torch.float16,
low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs,
max_new_tokens=1,
temperature=0.1,
top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
đ Documentation
Model Details
Property |
Details |
Model Type |
LlamaForCausalLM |
Base Model |
meta-llama/Llama-2-8b-chat |
Parameters |
8.03B |
Architecture |
Llama |
Quantization |
Q4_K (193 tensors) + Q6_K (33 tensors) |
Size |
4.58 GiB |
License |
llama3.1 |
Performance Metrics
- Load Time: 3.27 seconds (on Apple M3 Pro)
- Memory Usage:
- CPU Buffer: 4552.80 MiB
- Metal Buffer: 132.50 MiB
- KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V)
- Compute Buffer: 560.00 MiB
Hardware Compatibility
Apple Silicon Optimizations
- Optimized for Metal/MPS
- Unified Memory Architecture support
- SIMD group reduction and matrix multiplication optimizations
- Efficient layer offloading (1/33 layers to GPU)
System Requirements
- Recommended Memory: 12GB+
- GPU: Apple Silicon preferred (M1/M2/M3 series)
- Storage: 5GB free space
Classification Categories
The model classifies content into the following categories:
- B1-PROFANITY - Contains profane or vulgar language
- B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms
- B3-COMPETITOR - Mentions or promotes competing brands
- B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands
- B5-MISLEADING - Contains misleading or deceptive information
- B6-POLITICAL - Contains political content or bias
- B7-RELIGIOUS - Contains religious content or references
- B8-CONTROVERSIAL - Contains controversial topics or discussions
- B9-ADULT - Contains adult or mature content
- B10-VIOLENCE - Contains violent content or references
- B11-SUBSTANCE - Contains references to drugs, alcohol, or substances
- B12-HATE - Contains hate speech or discriminatory content
- B13-STEREOTYPE - Contains stereotypical representations
- B14-BIAS - Shows bias against groups or individuals
- B15-UNPROFESSIONAL - Contains unprofessional content or behavior
- B16-MANIPULATION - Contains manipulative content or tactics
- SAFE - Contains no brand safety concerns
Model Architecture
- Attention Mechanism:
- Head Count: 32
- KV Head Count: 8
- Layer Count: 32
- Embedding Length: 4096
- Feed Forward Length: 14336
- Context Length: 2048 (optimized from 131072)
- RoPE Base Frequency: 500000
- Dimension Count: 128
Training & Fine-tuning
This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with:
- Temperature: 0.1
- Top-p: 0.9
- Batch Size: 512
- Thread Count: 8
Limitations
- The model is optimized for shorter content classification (up to 2048 tokens).
- Performance may vary on non-Apple Silicon hardware.
- The model focuses solely on brand safety classification and may not be suitable for other tasks.
- Classification accuracy may vary based on content complexity and context.
đ§ Technical Details
The model is based on the Llama architecture, specifically LlamaForCausalLM
. It uses a combination of Q4_K and Q6_K quantization to reduce memory usage while maintaining high performance. The attention mechanism is optimized with 32 heads and 8 KV heads, allowing for efficient processing of long sequences. The model is fine-tuned on the BrandSafe-16k dataset to achieve high accuracy in brand safety classification.
đ License
This model is licensed under llama3.1.
đ Citation
If you use this model in your research, please cite:
@misc{vision-1-mini,
author = {Max Sonderby},
title = {Vision-1-Mini: Optimized Brand Safety Classification Model},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/maxsonderby/vision-1-mini}}
}