Dinov2 With Registers Base
A vision Transformer model trained with DINOv2, optimized with register tokens to enhance attention mechanisms and improve feature extraction capabilities
Downloads 22.74k
Release Time : 12/20/2024
Model Overview
This is a base version of Vision Transformer (ViT) with registers, trained using the DINOv2 self-supervised method, capable of extracting high-quality feature representations from images for various computer vision tasks.
Model Features
Register mechanism
Eliminates attention map artifacts by adding dedicated register tokens, resulting in clearer attention distributions
Self-supervised learning
Trained using the DINOv2 method, capable of learning meaningful image feature representations without labeled data
Attention optimization
Improved attention mechanism provides more interpretable attention maps, aiding in understanding model decision processes
Model Capabilities
Image feature extraction
Self-supervised learning
Foundation model for computer vision tasks
Use Cases
Computer vision
Image classification
Can serve as a foundation model with added classification heads for image classification tasks
Object detection
Extracted image features can be used for object detection tasks
Image similarity calculation
Utilizes extracted feature vectors to compute similarity between images
Featured Recommended AI Models
Qwen2.5 VL 7B Abliterated Caption It I1 GGUF
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Abliterated-Caption-it, supporting multilingual image description tasks.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
167
1
Nunchaku Flux.1 Dev Colossus
Other
The Nunchaku quantized version of the Colossus Project Flux, designed to generate high-quality images based on text prompts. This model minimizes performance loss while optimizing inference efficiency.
Image Generation English
N
nunchaku-tech
235
3
Qwen2.5 VL 7B Abliterated Caption It GGUF
Apache-2.0
This is a static quantized version based on the Qwen2.5-VL-7B model, focusing on image captioning generation tasks and supporting multiple languages.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
133
1
Olmocr 7B 0725 FP8
Apache-2.0
olmOCR-7B-0725-FP8 is a document OCR model based on the Qwen2.5-VL-7B-Instruct model. It is fine-tuned using the olmOCR-mix-0225 dataset and then quantized to the FP8 version.
Image-to-Text
Transformers English

O
allenai
881
3
Lucy 128k GGUF
Apache-2.0
Lucy-128k is a model developed based on Qwen3-1.7B, focusing on proxy-based web search and lightweight browsing, and can run efficiently on mobile devices.
Large Language Model
Transformers English

L
Mungert
263
2