lb-reranker-0.5B-v1.0开源模型 - 支持95+语言的查询与文本相关性判断及检索排序

首页

Lb Reranker 0.5B V1.0

由 lightblue 开发

LB重排序器是用于判断查询与文本片段相关性的模型，支持95+种语言，适用于检索任务中的排序和重排序。

大型语言模型

Transformers

支持多种语言开源协议:Apache-2.0 #多语言重排序 #检索增强生成 #低延迟推理

下载量 917

发布时间 : 1/6/2025

模型简介

基于Qwen2.5-0.5B-Instruct微调的轻量级重排序模型，通过输出1-7的相关性评分优化检索结果排序。

模型特点

多语言支持

训练涵盖95+种语言，是目前支持语言最广泛的重排序器之一

兼容性强

输出为1-7数字字符串，可直接兼容vLLM/LMDeploy等主流推理框架

高效推理

在BEIR基准测试中表现优于同类模型且推理速度更快

代码排序能力

在代码片段重排序任务中P@1准确率达96%

模型能力

查询-文本相关性评分

多语言检索优化

代码片段排序

大规模文档检索

使用案例

信息检索

搜索引擎结果优化

对搜索引擎返回的文档进行相关性重排序

在BEIR基准测试中优于BGE等基准模型

代码检索

代码片段排序

对代码库检索结果进行相关性排序

P@1准确率达96%

🚀 LB Reranker v1.0

LB Reranker经过训练，可用于判断给定查询与一段文本的相关性，因此可在各种基于检索的任务中用作排序器或重排序器。

🚀 快速开始

LB Reranker是一个强大的工具，用于文本重排序任务。以下是使用该模型的基本步骤和示例代码，帮助你快速上手。

✨ 主要特性

多语言支持：该模型在超过95种语言的数据上进行了训练，适用于广泛的用例。
性能优越：在评估基准上表现出略高的性能。
易于集成：作为一个简单的因果语言模型，可与许多广泛使用的推理包（如vLLM和LMDeploy）原生集成。
代码重排序能力：该模型在代码片段重排序任务中也表现出色（P@1达到96%）。

📦 安装指南

vLLM

使用以下命令安装vLLM：

pip install vllm

LMDeploy

使用以下命令安装LMDeploy：

pip install lmdeploy

OpenAI

使用以下命令安装openai：

pip install openai

💻 使用示例

基础用法

模型训练时期望的输入格式如下：

<<<Query>>>
{your_query_here}

<<<Context>>>
{your_context_here}

并输出一个1 - 7之间的数字字符串。

高级用法

为了得到可用于重排序查询 - 上下文对的连续分数（即减少平局的方法），我们计算分数的期望值。以下是在不同框架中实现此功能的代码示例：

vLLM

from vllm import LLM, SamplingParams
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0

llm = LLM("lightblue/lb-reranker-v1.0")
sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
tok = llm.llm_engine.tokenizer.tokenizer
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = llm.chat(chats, sampling_params)
probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66570732 1.86686378 1.01102923]

LMDeploy

# Un-comment this if running in a Jupyter notebook, Colab etc.
# import nest_asyncio
# nest_asyncio.apply()

from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0

pipe = pipeline(
    "lightblue/lb-reranker-v1.0",
    chat_template_config=ChatTemplateConfig(
                    model_name='qwen2d5',
                    capability='chat'
    )
)
tok = pipe.tokenizer.model
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = pipe(
    chats, 
    gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
)
probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66415229 1.84342025 1.01133205]

OpenAI (Hosted on Huggingface)

from openai import OpenAI
import numpy as np
from multiprocessing import Pool
from tqdm.auto import tqdm

client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1/",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Change this to an access token from https://huggingface.co/settings/tokens
)

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_reranker_score(context_question_tuple):
    question, context = context_question_tuple

    messages = make_reranker_inference_conversation(context, question)

    completion = client.chat.completions.create(
        model="lightblue/lb-reranker-0.5B-v1.0", 
        messages=messages,
        max_tokens=1,
        temperature=0.0,
        logprobs=True,
        top_logprobs=5, # Max allowed by the openai API as top_n_tokens must be >= 0 and <= 5. If this gets changed, fix to > 7.
    )

    logprobs = completion.choices[0].logprobs.content[0].top_logprobs

    calculated_score = sum([int(x.token) * np.exp(x.logprob) for x in logprobs])

    return calculated_score

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

with Pool(processes=16) as p: # Allows for parallel processing
    expected_vals = list(tqdm(p.imap(get_reranker_score, query_texts), total=len(query_texts)))

print(expected_vals)
# [6.64866580, 1.85144404, 1.010719508]

📚 详细文档

评估

我们在BEIR基准的9个数据集上进行了评估，据我们所知，这些数据集均未用于评估模型的训练。这些数据集包括：

Arguana
Dbpedia-entity
Fiqa
NFcorpus
Scidocs
Scifact
Trec-covid-v2
Vihealthqa
Webis-touche2020

为了节省评估时间，我们仅对所有查询的一个子集（前250个）进行了评估。评估结果表明，我们的模型在不影响推理速度的情况下，与许多最先进的重排序模型表现相当或更优。

我们将评估代码和结果发布在我们的Github上。

image/png

如图所示，除了@1位置外，该重排序器在所有位置的信息检索评估指标上均优于我们纳入的两个基准。

image/png

我们还证明了我们的模型平均比BGE重排序器v2更快。

🔧 技术细节

基础模型：该模型基于Qwen/Qwen2.5 - 0.5B - Instruct模型检查点进行微调。
训练数据：训练数据可在lightblue/reranker_continuous_filt_max7_train找到。
训练环境：使用阿里云的8 x L20实例（[ecs.gn8is - 8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu - accelerated - compute - optimized - and - vgpu - accelerated - instance - families - 1)）进行了约5.5小时的训练。

📄 许可证

我们根据Apache 2.0许可证共享此模型。

👨‍💻 开发者信息

该模型由Peter Devine (ptrdvn)为Lightblue训练。

📋 信息表格

属性	详情
库名称	transformers
支持语言	英语、中文、西班牙语、德语、阿拉伯语、俄语、日语、韩语、印地语、斯洛伐克语、越南语、土耳其语、芬兰语、印尼语、波斯语、挪威语、泰语、瑞典语、葡萄牙语、丹麦语、孟加拉语、泰卢固语、罗马尼亚语、意大利语、法语、荷兰语、斯瓦希里语、波兰语、匈牙利语、捷克语、希腊语、乌克兰语、马拉地语、泰米尔语、他加禄语、保加利亚语、立陶宛语、乌尔都语、希伯来语、古吉拉特语、卡纳达语、阿姆哈拉语、哈萨克语、克罗地亚语、乌兹别克语、爪哇语、加泰罗尼亚语、阿塞拜疆语、马来语、塞尔维亚语、斯洛文尼亚语、约鲁巴语、拉脱维亚语、冰岛语、豪萨语、格鲁吉亚语、爱沙尼亚语、波斯尼亚语、亚美尼亚语、马拉雅拉姆语、旁遮普语、马耳他语、高棉语、阿尔巴尼亚语、奥里亚语、阿萨姆语、缅甸语、蒙古语、南非荷兰语、白俄罗斯语、爱尔兰语、马其顿语、威尔士语、加利西亚语、宿务语、拉丁语、意第绪语、卢森堡语、塔吉克语、苏格兰盖尔语、尼泊尔语、普什图语、巴斯克语、吉尔吉斯语、库尔德语、僧伽罗语、海地克里奥尔语、世界语、老挝语、弗里西语、信德语、马达加斯加语、索马里语、库尔德语（中库尔德语）、巽他语、挪威语（新挪威语）
数据集	lightblue/reranker_continuous_filt_max7_train
基础模型	Qwen/Qwen2.5 - 0.5B - Instruct
任务类型	文本生成
标签	reranker