preranker-v1开源预排序器 - 优化大语言模型调用流程提升效率

首页

Preranker V1

由 yjoonjang 开发

预排序器是一种基于交叉编码器的文本排序模型，旨在优化大语言模型的函数调用流程，通过缩小可用工具的语料库来提高效率。

大型语言模型

Safetensors

英语开源协议:Apache-2.0 #工具检索优化 #函数调用排序 #交叉编码器

下载量 29

发布时间 : 4/7/2025

模型简介

预排序器是一种用于文本排序的交叉编码器模型，主要用于根据给定查询对可用工具进行排序，以优化大语言模型的函数调用流程。

模型特点

高效工具排序

通过交叉编码器技术，预排序器能够高效地对可用工具进行排序，优化大语言模型的函数调用流程。

高性能

在MTEB-ToolRet基准测试中，预排序器在多个指标上表现优于同类模型。

易于集成

预排序器通过sentence-transformers库实现，易于集成到现有系统中。

模型能力

文本排序

工具检索优化

函数调用流程优化

使用案例

工具检索

Wayback Machine可用性检查

根据查询对可用工具进行排序，以确定Wayback Machine中特定URL的可用性。

在示例中，预排序器成功识别出与查询最相关的工具。

🚀 📊 预排序器 - 预先重排序工具

预排序器（Pre:Ranker）是一款用于优化现代大语言模型（LLMs）函数调用过程的工具，它能够根据给定的查询缩小可用工具的语料库范围。

🚀 快速开始

安装依赖

pip install sentence-transformers

使用示例

from sentence_transformers.cross_encoder import CrossEncoder

model = CrossEncoder('yjoonjang/preranker-v1')
model.eval()
pairs = [
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'availability', 'description': 'Checks if a given URL is archived and currently accessible in the Wayback Machine.', 'parameters': {'url': {'description': 'The URL to check for availability in the Wayback Machine.', 'type': 'str', 'default': 'http://mashape.com'}, 'timestamp': {'description': \"The timestamp to look up in Wayback. If not specified, the most recent available capture is returned. The format of the timestamp is 1-14 digits (YYYYMMDDhhmmss). Defaults to '20090101'.\", 'type': 'str, optional', 'default': '20090101'}, 'callback': {'description': 'An optional callback to produce a JSONP response. Defaults to None.', 'type': 'str, optional', 'default': ''}}}"],
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'top_grossing_mac_apps', 'description': 'Fetches a list of the top-grossing Mac apps from the App Store.', 'parameters': {'category': {'description': \"The category ID for the apps to be fetched. Defaults to '6016' (general category).\", 'type': 'str', 'default': '6016'}, 'country': {'description': \"The country code for the App Store. Defaults to 'us'.\", 'type': 'str', 'default': 'us'}, 'lang': {'description': \"The language code for the results. Defaults to 'en'.\", 'type': 'str', 'default': 'en'}, 'num': {'description': 'The number of results to return. Defaults to 100. Maximum allowed value is 200.', 'type': 'int', 'default': '100'}}}"],
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'top_paid_mac_apps', 'description': 'Retrieves a list of the top paid Mac apps from the App Store.', 'parameters': {'category': {'description': \"Category of the apps to retrieve. Default is '6016'.\", 'type': 'str', 'default': '6016'}, 'country': {'description': \"Country code to filter the app results. Default is 'us'.\", 'type': 'str', 'default': 'us'}, 'lang': {'description': \"Language code for the results. Default is 'en'.\", 'type': 'str', 'default': 'en'}, 'num': {'description': 'Number of results to return. Default is 100. Maximum is 200.', 'type': 'int', 'default': '100'}}}",]
]

scores = model.predict(pairs)
print(scores) # [0.91427845 0.7625548  0.7656321]

✨ 主要特性

优化函数调用：通过缩小可用工具的语料库范围，优化现代大语言模型的函数调用过程。
易于使用：提供简单的API接口，方便集成到现有项目中。

📦 安装指南

pip install sentence-transformers

💻 使用示例

基础用法

from sentence_transformers.cross_encoder import CrossEncoder

model = CrossEncoder('yjoonjang/preranker-v1')
model.eval()
pairs = [
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'availability', 'description': 'Checks if a given URL is archived and currently accessible in the Wayback Machine.', 'parameters': {'url': {'description': 'The URL to check for availability in the Wayback Machine.', 'type': 'str', 'default': 'http://mashape.com'}, 'timestamp': {'description': \"The timestamp to look up in Wayback. If not specified, the most recent available capture is returned. The format of the timestamp is 1-14 digits (YYYYMMDDhhmmss). Defaults to '20090101'.\", 'type': 'str, optional', 'default': '20090101'}, 'callback': {'description': 'An optional callback to produce a JSONP response. Defaults to None.', 'type': 'str', optional', 'default': ''}}}"],
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'top_grossing_mac_apps', 'description': 'Fetches a list of the top-grossing Mac apps from the App Store.', 'parameters': {'category': {'description': \"The category ID for the apps to be fetched. Defaults to '6016' (general category).\", 'type': 'str', 'default': '6016'}, 'country': {'description': \"The country code for the App Store. Defaults to 'us'.\", 'type': 'str', 'default': 'us'}, 'lang': {'description': \"The language code for the results. Defaults to 'en'.\", 'type': 'str', 'default': 'en'}, 'num': {'description': 'The number of results to return. Defaults to 100. Maximum allowed value is 200.', 'type': 'int', 'default': '100'}}}"],
    ["Is 'https://www.apple.com' available in the Wayback Machine on September 9, 2015?", "{'name': 'top_paid_mac_apps', 'description': 'Retrieves a list of the top paid Mac apps from the App Store.', 'parameters': {'category': {'description': \"Category of the apps to retrieve. Default is '6016'.\", 'type': 'str', 'default': '6016'}, 'country': {'description': \"Country code to filter the app results. Default is 'us'.\", 'type': 'str', 'default': 'us'}, 'lang': {'description': \"Language code for the results. Default is 'en'.\", 'type': 'str', 'default': 'en'}, 'num': {'description': 'Number of results to return. Default is 100. Maximum is 200.', 'type': 'int', 'default': '100'}}}",]
]

scores = model.predict(pairs)
print(scores) # [0.91427845 0.7625548  0.7656321]

📚 详细文档

新闻动态

2025.04.09: 🤗 preranker-v1, MTEB-ToolRetrieval 发布！

关于预排序器

有众多工具和功能需要使用？不妨试试 预排序器（Pre:Ranker） ！它旨在通过根据给定查询缩小可用工具的语料库范围，优化现代大语言模型的函数调用过程。更多详情请查看 🐱GITHUB 。

MTEB-ToolRet

将 ToolRet Benchmark 转换为 BEIR 格式，以使其与 MTEB 兼容。更多细节请查看 make_toolret_to_beir_format.ipynb 。

评估代码

git clone https://github.com/yjoonjang/PreRanker.git
cd PreRanker/toolret_eval
python run_mteb.py

评估结果

模型名称	模型参数	Recall@10	MAP@10	MRR@10	Precision@10	NDCG@10
yjoonjang/preranker-v1	150M	0.540	0.361	0.462	0.088	0.428
Alibaba-NLP/gte-reranker-modernbert-base	150M	0.524	0.356	0.454	0.086	0.422
jinaai/jina-reranker-v2-base-multilingual	278M	0.502	0.331	0.414	0.083	0.395
Alibaba-NLP/gte-multilingual-reranker-base	306M	0.474	0.299	0.383	0.078	0.363
BAAI/bge-reranker-v2-m3	568M	0.461	0.293	0.370	0.076	0.355

训练详情

preranker-v1 是基于 Alibaba-NLP/gte-reranker-modernbert-base 微调的模型，使用 sentence-transformers 进行训练。
训练数据即将发布。

训练过程

损失函数: ListNetLoss
批量大小: 4
学习率: 2e-5
训练轮数: 1

🔧 技术细节

预排序器（Pre:Ranker）基于 sentence-transformers 库构建，使用 CrossEncoder 进行文本排序。通过微调 Alibaba-NLP/gte-reranker-modernbert-base 模型，使其能够根据给定查询对工具进行排序。

📄 许可证

apache-2.0

引用

@misc{Pre:Ranker,
  publisher = {Youngjoon Jang, Seongtae Hong},
  year = {2025},
  url = {https://github.com/yjoonjang/preranker}
},

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to Rank: From Pairwise Approach to Listwise Approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}