R

RS M CLIP

Developed by joaodaniel
A multilingual vision-language pre-trained model for the remote sensing field, supporting image-text cross-modal tasks in 10 languages.
Downloads 248
Release Time : 11/5/2024

Model Overview

RS-M-CLIP is an improved model based on the CLIP architecture, specifically optimized for remote sensing image processing. Its performance is enhanced by integrating multilingual translation data and self-distillation methods. It supports tasks such as cross-modal retrieval and zero-shot image classification.

Model Features

Multilingual Support
Supports text input in 10 languages, including major European and Asian languages.
Optimized for Remote Sensing Field
Specifically trained for the characteristics of satellite/aerial images, performing excellently in remote sensing tasks.
Self-distillation Training
Adopts a self-supervised method that aligns local and global representations to improve model performance.

Model Capabilities

Multilingual Image Classification
Cross-modal Image Retrieval
Multilingual Text Retrieval
Zero-shot Learning

Use Cases

Geospatial Analysis
Satellite Image Classification
Perform zero-shot classification on satellite images, such as identifying targets like airplanes and buildings.
Accurately identified airplane images in the example.
Multilingual Image Retrieval
Retrieve relevant remote sensing images using queries in different languages.
Supports query input in 10 languages.
Urban Planning
Land Use Analysis
Identify land use types such as urban areas and green spaces.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase