🚀 gte-multilingual-reranker-base
The gte-multilingual-reranker-base model is the first reranker model in the GTE family. It offers high - performance multilingual retrieval capabilities and has several key advantages:
- High Performance: It achieves state - of - the - art (SOTA) results in multilingual retrieval tasks and multi - task representation model evaluations compared to similar - sized reranker models.
- Training Architecture: Trained with an encoder - only transformers architecture, it has a smaller model size. Unlike previous models based on decode - only LLM architecture (e.g., gte - qwen2 - 1.5b - instruct), it has lower hardware requirements for inference and offers a 10x increase in inference speed.
- Long Context: It supports text lengths up to 8192 tokens.
- Multilingual Capability: It supports over 70 languages.
✨ Features
- High Performance: Achieves state - of - the - art results in multilingual retrieval and multi - task representation model evaluations among similar - sized reranker models.
- Training Architecture: Uses an encoder - only transformers architecture, resulting in a smaller model size and lower hardware requirements for inference, with a 10x increase in inference speed compared to some previous models.
- Long Context: Supports text up to 8192 tokens.
- Multilingual Capability: Supports over 70 languages.
📦 Installation
⚠️ Important Note
It is recommended to install xformers and enable unpadding for acceleration. Refer to [enable - unpadding - and - xformers](https://huggingface.co/Alibaba - NLP/new - impl#recommendation - enable - unpadding - and - acceleration - with - xformers).
💡 Usage Tip
For offline usage, refer to [new - impl/discussions/2](https://huggingface.co/Alibaba - NLP/new - impl/discussions/2#662b08d04d8c3d0a09c88fa3).
💻 Usage Examples
Basic Usage
Using Huggingface transformers (transformers>=4.36.0)
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name_or_path = "Alibaba-NLP/gte-multilingual-reranker-base"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForSequenceClassification.from_pretrained(
model_name_or_path, trust_remote_code=True,
torch_dtype=torch.float16
)
model.eval()
pairs = [["中国的首都在哪儿","北京"], ["what is the capital of China?", "北京"], ["how to implement quick sort in python?","Introduction of quick sort"]]
with torch.no_grad():
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
print(scores)
Advanced Usage
Usage with infinity:
Infinity, a MIT Licensed Inference RestAPI Server.
docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \
michaelf34/infinity:0.0.68 \
v2 --model-id Alibaba-NLP/gte-multilingual-reranker-base --revision "main" --dtype bfloat16 --batch-size 32 --device cuda --engine torch --port 7997
📚 Documentation
Model Information
Property |
Details |
Model Size |
306M |
Max Input Tokens |
8192 |
Evaluation
Results of reranking based on multiple text retrieval datasets

More detailed experimental results can be found in the paper.
Cloud API Services
In addition to the open - source [GTE](https://huggingface.co/collections/Alibaba - NLP/gte - models - 6680f0b13f885cb431e6d469) series models, GTE series models are also available as commercial API services on Alibaba Cloud.
- [Embedding Models](https://help.aliyun.com/zh/model - studio/developer - reference/general - text - embedding/): Three versions of the text embedding models are available: text - embedding - v1/v2/v3, with v3 being the latest API service.
- [ReRank Models](https://help.aliyun.com/zh/model - studio/developer - reference/general - text - sorting - model/): The gte - rerank model service is available.
Note that the models behind the commercial APIs are not entirely identical to the open - source models.
📄 License
This model is released under the Apache 2.0 license.
🔧 Technical Details
The gte - multilingual - reranker - base model is trained using an encoder - only transformers architecture. This architecture choice results in a smaller model size compared to some previous models. It has lower hardware requirements for inference, which is beneficial for deployment in various environments. The model supports text lengths up to 8192 tokens, making it suitable for handling long - context text. It also supports over 70 languages, enabling multilingual text retrieval tasks.
📖 Citation
If you find our paper or models helpful, please consider citing:
@inproceedings{zhang2024mgte,
title={mGTE: Generalized Long - Context Text Representation and Reranking Models for Multilingual Text Retrieval},
author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
pages={1393--1412},
year={2024}
}