Preranker-v1 Open-Source Pre-Sorter: Optimize the Invocation Process of Large Language Models to Improve Efficiency

Preranker V1

Developed by yjoonjang

The pre-ranker is a cross-encoder-based text ranking model designed to optimize the function call process of large language models by narrowing down the corpus of available tools for improved efficiency.

Large Language Model

Safetensors

EnglishOpen Source License:Apache-2.0 #Tool Retrieval Optimization #Function Call Sorting #Cross-Encoder

Downloads 29

Release Time : 4/7/2025

Model Overview

The pre-ranker is a cross-encoder model for text ranking, primarily used to sort available tools based on a given query to optimize the function call process of large language models.

Model Features

Efficient Tool Sorting

Using cross-encoder technology, the pre-ranker efficiently sorts available tools to optimize the function call process of large language models.

High Performance

In the MTEB-ToolRet benchmark, the pre-ranker outperforms similar models across multiple metrics.

Easy Integration

The pre-ranker is implemented via the sentence-transformers library, making it easy to integrate into existing systems.

Model Capabilities

Text Ranking

Tool Retrieval Optimization

Function Call Process Optimization

Use Cases

Tool Retrieval

Wayback Machine Availability Check

Sort available tools based on a query to determine the availability of a specific URL in the Wayback Machine.

In the example, the pre-ranker successfully identified the tool most relevant to the query.

🚀 📊 Pre:Ranker - reranking tools beforehand

Pre:Ranker is a reranking tool designed to optimize the function - calling process of modern LLMs. It narrows down the corpus of available tools based on a given query, enhancing the efficiency of tool selection.

🚀 Quick Start

News

2025.04.09: [🤗 preranker - v1](https://huggingface.co/yjoonjang/preranker - v1), [MTEB - ToolRetrieval](https://github.com/yjoonjang/preranker?tab=readme - ov - file#MTEB - ToolRet) released!

About Pre:Ranker

Are you overwhelmed by numerous tools and functions? Try Pre:Ranker! It is engineered to streamline the function - calling process of modern LLMs. By narrowing down the available tool corpus according to a given query, it makes tool selection more efficient. Check 🐱GITHUB for more details!

💻 Usage Examples

Basic Usage

from sentence_transformers.cross_encoder import CrossEncoder

model = CrossEncoder('yjoonjang/preranker-v1')
model.eval()
pairs = [
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'availability', 'description': 'Checks if a given URL is archived and currently accessible in the Wayback Machine.', 'parameters': {'url': {'description': 'The URL to check for availability in the Wayback Machine.', 'type': 'str', 'default': 'http://mashape.com'}, 'timestamp': {'description': \"The timestamp to look up in Wayback. If not specified, the most recent available capture is returned. The format of the timestamp is 1-14 digits (YYYYMMDDhhmmss). Defaults to '20090101'.\", 'type': 'str, optional', 'default': '20090101'}, 'callback': {'description': 'An optional callback to produce a JSONP response. Defaults to None.', 'type': 'str, optional', 'default': ''}}}"],
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'top_grossing_mac_apps', 'description': 'Fetches a list of the top-grossing Mac apps from the App Store.', 'parameters': {'category': {'description': \"The category ID for the apps to be fetched. Defaults to '6016' (general category).\", 'type': 'str', 'default': '6016'}, 'country': {'description': \"The country code for the App Store. Defaults to 'us'.\", 'type': 'str', 'default': 'us'}, 'lang': {'description': \"The language code for the results. Defaults to 'en'.\", 'type': 'str', 'default': 'en'}, 'num': {'description': 'The number of results to return. Defaults to 100. Maximum allowed value is 200.', 'type': 'int', 'default': '100'}}}"],
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'top_paid_mac_apps', 'description': 'Retrieves a list of the top paid Mac apps from the App Store.', 'parameters': {'category': {'description': \"Category of the apps to retrieve. Default is '6016'.\", 'type': 'str', 'default': '6016'}, 'country': {'description': \"Country code to filter the app results. Default is 'us'.\", 'type': 'str', 'default': 'us'}, 'lang': {'description': \"Language code for the results. Default is 'en'.\", 'type': 'str', 'default': 'en'}, 'num': {'description': 'Number of results to return. Default is 100. Maximum is 200.', 'type': 'int', 'default': '100'}}}"]
]

scores = model.predict(pairs)
print(scores) # [0.91427845 0.7625548  0.7656321]

📚 Documentation

MTEB - ToolRet

The [ToolRet Benchmark](https://github.com/mangopy/tool - retrieval - benchmark) has been converted into BEIR format to make it MTEB - compatible. Check make_toolret_to_beir_format.ipynb for more details.

Evaluation Code

git clone https://github.com/yjoonjang/PreRanker.git
cd PreRanker/toolret_eval
python run_mteb.py

Evaluation Results

Property	Details
Model Type	Cross - Encoder
Training Data	Will soon be released

Model Name	Model Parameter	Recall@10	MAP@10	MRR@10	Precision@10	NDCG@10
[yjoonjang/preranker - v1](https://huggingface.co/yjoonjang/preranker - v1)	150M	0.540	0.361	0.462	0.088	0.428
[Alibaba - NLP/gte - reranker - modernbert - base](https://huggingface.co/Alibaba - NLP/gte - reranker - modernbert - base)	150M	0.524	0.356	0.454	0.086	0.422
[jinaai/jina - reranker - v2 - base - multilingual](https://huggingface.co/jinaai/jina - reranker - v2 - base - multilingual)	278M	0.502	0.331	0.414	0.083	0.395
[Alibaba - NLP/gte - multilingual - reranker - base](https://huggingface.co/Alibaba - NLP/gte - multilingual - reranker - base)	306M	0.474	0.299	0.383	0.078	0.363
[BAAI/bge - reranker - v2 - m3](https://huggingface.co/BAAI/bge - reranker - v2 - m3)	568M	0.461	0.293	0.370	0.076	0.355

🔧 Technical Details

Training Details

preranker - v1 is a fine - tuned model based on [Alibaba - NLP/gte - reranker - modernbert - base](https://huggingface.co/Alibaba - NLP/gte - reranker - modernbert - base), via sentence - transformers
Training data will soon be released.

Training Procedure

loss: ListNetLoss
batch size: 4
learning rate: 2e - 5
epoch: 1

📄 License

apache - 2.0

Citation

@misc{Pre:Ranker,
  publisher = {Youngjoon Jang, Seongtae Hong},
  year = {2025},
  url = {https://github.com/yjoonjang/preranker}
},

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to Rank: From Pairwise Approach to Listwise Approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご