Moonvit SO 400M
MoonViT is a native resolution visual encoder, initialized and continuously pre-trained based on SigLIP-SO-400M, suitable for image feature extraction tasks.
Downloads 275
Release Time : 4/10/2025
Model Overview
MoonViT is a visual encoder specifically designed for image feature extraction, trained based on the SigLIP-SO-400M model, capable of processing high-resolution images and extracting effective features.
Model Features
Native resolution support
MoonViT can process images at native resolution and extract features without downsampling.
Based on SigLIP-SO-400M
The model's initialization and continuous pre-training are based on SigLIP-SO-400M, inheriting its powerful visual feature extraction capabilities.
Efficient feature extraction
Optimized for image feature extraction, capable of generating high-quality image feature representations.
Model Capabilities
Image feature extraction
High-resolution image processing
Use Cases
Computer vision
Image understanding
Extract image features for subsequent tasks such as image classification, object detection, etc.
High-quality image feature representations
Multimodal learning
Used as a visual encoder combined with language models to build multimodal systems.
Featured Recommended AI Models