đ Git-RSCLIP
Git-RSCLIP is a pre - trained model on the Git - 10M dataset for remote sensing image - text related tasks, offering capabilities in zero - shot image classification and image - text retrieval.
đ Quick Start
[Git-RSCLIP] is pre - trained on the Git - 10M dataset (a global - scale remote sensing image - text pair dataset, consisting of 10 million image - text pairs) at size 256x256, first released in [this repository](https://github.com/chen - yang - liu/Text2Earth). It employs a similar structure to [[google/siglip - large - patch16 - 256](https://huggingface.co/google/siglip - large - patch16 - 256)].
This is a large version, the base version is here: [[Git - RSCLIP - base](https://huggingface.co/lcybuaa/Git - RSCLIP - base)]
⨠Features
You can use the raw model for tasks like zero - shot image classification and image - text retrieval.
đģ Usage Examples
Basic Usage
Use Git - RSCLIP to get image features
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("lcybuaa/Git-RSCLIP")
processor = AutoProcessor.from_pretrained("lcybuaa/Git-RSCLIP")
url = "https://github.com/Chen-Yang-Liu/PromptCC/blob/main/Example/B/train_000051.png?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
image_features = model.get_image_features(**inputs)
zero - shot image classification
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("lcybuaa/Git-RSCLIP")
processor = AutoProcessor.from_pretrained("lcybuaa/Git-RSCLIP")
url = "https://github.com/Chen-Yang-Liu/PromptCC/blob/main/Example/B/train_000051.png?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
texts = ["a remote sensing image of river", "a remote sensing image of houses and roads"]
inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image)
top5_indices = torch.argsort(probs, descending=True)[:, :5].cpu().numpy()
top1_indices = top5_indices[:, 0]
print(f"the image 0 is '{top1_indices[0]}'")
For more code examples, we refer to the documentation.
đ§ Technical Details
Training Data
Git - RSCLIP is pre - trained on the Git - 10M dataset (a global - scale remote sensing image - text pair dataset, consisting of 10 million image - text pairs) [(Liu et al., 2024)](https://github.com/chen - yang - liu/Text2Earth).
Preprocessing
Images are resized/rescaled to the same resolution (256x256) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
Texts are tokenized and padded to the same length (64 tokens).
đ Documentation
Evaluation of Git - RSCLIP compared to other CLIP is shown below (taken from the paper).

đ License
The model is licensed under the Apache - 2.0 license.
BibTeX entry and citation info
@ARTICLE{10988859,
author={Liu, Chenyang and Chen, Keyan and Zhao, Rui and Zou, Zhengxia and Shi, Zhenwei},
journal={IEEE Geoscience and Remote Sensing Magazine},
title={Text2Earth: Unlocking text - driven remote sensing image generation with a global - scale dataset and a foundation model},
year={2025},
volume={},
number={},
pages={2 - 23},
doi={10.1109/MGRS.2025.3560455}}