MoonViT-SO-400M Open-source Visual Encoder - Free for Efficient Image Feature Extraction

Moonvit SO 400M

Developed by moonshotai

MoonViT is a native resolution visual encoder, initialized and continuously pre-trained based on SigLIP-SO-400M, suitable for image feature extraction tasks.

Image Enhancement

Transformers

Open Source License:MIT #Native resolution visual encoding #SigLIP pre-training #Multimodal feature extraction

Downloads 275

Release Time : 4/10/2025

Model Overview

MoonViT is a visual encoder specifically designed for image feature extraction, trained based on the SigLIP-SO-400M model, capable of processing high-resolution images and extracting effective features.

Model Features

Native resolution support

MoonViT can process images at native resolution and extract features without downsampling.

Based on SigLIP-SO-400M

The model's initialization and continuous pre-training are based on SigLIP-SO-400M, inheriting its powerful visual feature extraction capabilities.

Efficient feature extraction

Optimized for image feature extraction, capable of generating high-quality image feature representations.

Model Capabilities

Image feature extraction

High-resolution image processing

Use Cases

Computer vision

Image understanding

Extract image features for subsequent tasks such as image classification, object detection, etc.

High-quality image feature representations

Multimodal learning

Used as a visual encoder combined with language models to build multimodal systems.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Moonvit SO 400M

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 MoonViT

🚀 Quick Start

💻 Usage Examples

Basic Usage

📄 License