lb-reranker-0.5B-v1.0開源模型 - 支持95+語言的查詢與文本相關性判斷及檢索排序

首頁

Lb Reranker 0.5B V1.0

由lightblue開發

LB重排序器是用於判斷查詢與文本片段相關性的模型，支持95+種語言，適用於檢索任務中的排序和重排序。

大型語言模型

Transformers

支持多種語言開源協議:Apache-2.0 #多語言重排序 #檢索增強生成 #低延遲推理

下載量 917

發布時間 : 1/6/2025

模型概述

基於Qwen2.5-0.5B-Instruct微調的輕量級重排序模型，通過輸出1-7的相關性評分優化檢索結果排序。

模型特點

多語言支持

訓練涵蓋95+種語言，是目前支持語言最廣泛的重排序器之一

兼容性強

輸出為1-7數字字符串，可直接兼容vLLM/LMDeploy等主流推理框架

高效推理

在BEIR基準測試中表現優於同類模型且推理速度更快

代碼排序能力

在代碼片段重排序任務中P@1準確率達96%

模型能力

查詢-文本相關性評分

多語言檢索優化

代碼片段排序

大規模文檔檢索

使用案例

信息檢索

搜索引擎結果優化

對搜索引擎返回的文檔進行相關性重排序

在BEIR基準測試中優於BGE等基準模型

代碼檢索

代碼片段排序

對代碼庫檢索結果進行相關性排序

P@1準確率達96%

🚀 LB Reranker v1.0

LB Reranker經過訓練，可用於判斷給定查詢與一段文本的相關性，因此可在各種基於檢索的任務中用作排序器或重排序器。

🚀 快速開始

LB Reranker是一個強大的工具，用於文本重排序任務。以下是使用該模型的基本步驟和示例代碼，幫助你快速上手。

✨ 主要特性

多語言支持：該模型在超過95種語言的數據上進行了訓練，適用於廣泛的用例。
性能優越：在評估基準上表現出略高的性能。
易於集成：作為一個簡單的因果語言模型，可與許多廣泛使用的推理包（如vLLM和LMDeploy）原生集成。
代碼重排序能力：該模型在代碼片段重排序任務中也表現出色（P@1達到96%）。

📦 安裝指南

vLLM

使用以下命令安裝vLLM：

pip install vllm

LMDeploy

使用以下命令安裝LMDeploy：

pip install lmdeploy

OpenAI

使用以下命令安裝openai：

pip install openai

💻 使用示例

基礎用法

模型訓練時期望的輸入格式如下：

<<<Query>>>
{your_query_here}

<<<Context>>>
{your_context_here}

並輸出一個1 - 7之間的數字字符串。

高級用法

為了得到可用於重排序查詢 - 上下文對的連續分數（即減少平局的方法），我們計算分數的期望值。以下是在不同框架中實現此功能的代碼示例：

vLLM

from vllm import LLM, SamplingParams
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0

llm = LLM("lightblue/lb-reranker-v1.0")
sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
tok = llm.llm_engine.tokenizer.tokenizer
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = llm.chat(chats, sampling_params)
probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66570732 1.86686378 1.01102923]

LMDeploy

# Un-comment this if running in a Jupyter notebook, Colab etc.
# import nest_asyncio
# nest_asyncio.apply()

from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0

pipe = pipeline(
    "lightblue/lb-reranker-v1.0",
    chat_template_config=ChatTemplateConfig(
                    model_name='qwen2d5',
                    capability='chat'
    )
)
tok = pipe.tokenizer.model
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = pipe(
    chats, 
    gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
)
probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66415229 1.84342025 1.01133205]

OpenAI (Hosted on Huggingface)

from openai import OpenAI
import numpy as np
from multiprocessing import Pool
from tqdm.auto import tqdm

client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1/",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Change this to an access token from https://huggingface.co/settings/tokens
)

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_reranker_score(context_question_tuple):
    question, context = context_question_tuple

    messages = make_reranker_inference_conversation(context, question)

    completion = client.chat.completions.create(
        model="lightblue/lb-reranker-0.5B-v1.0", 
        messages=messages,
        max_tokens=1,
        temperature=0.0,
        logprobs=True,
        top_logprobs=5, # Max allowed by the openai API as top_n_tokens must be >= 0 and <= 5. If this gets changed, fix to > 7.
    )

    logprobs = completion.choices[0].logprobs.content[0].top_logprobs

    calculated_score = sum([int(x.token) * np.exp(x.logprob) for x in logprobs])

    return calculated_score

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

with Pool(processes=16) as p: # Allows for parallel processing
    expected_vals = list(tqdm(p.imap(get_reranker_score, query_texts), total=len(query_texts)))

print(expected_vals)
# [6.64866580, 1.85144404, 1.010719508]

📚 詳細文檔

評估

我們在BEIR基準的9個數據集上進行了評估，據我們所知，這些數據集均未用於評估模型的訓練。這些數據集包括：

Arguana
Dbpedia-entity
Fiqa
NFcorpus
Scidocs
Scifact
Trec-covid-v2
Vihealthqa
Webis-touche2020

為了節省評估時間，我們僅對所有查詢的一個子集（前250個）進行了評估。評估結果表明，我們的模型在不影響推理速度的情況下，與許多最先進的重排序模型表現相當或更優。

我們將評估代碼和結果發佈在我們的Github上。

image/png

如圖所示，除了@1位置外，該重排序器在所有位置的信息檢索評估指標上均優於我們納入的兩個基準。

image/png

我們還證明了我們的模型平均比BGE重排序器v2更快。

🔧 技術細節

基礎模型：該模型基於Qwen/Qwen2.5 - 0.5B - Instruct模型檢查點進行微調。
訓練數據：訓練數據可在lightblue/reranker_continuous_filt_max7_train找到。
訓練環境：使用阿里雲的8 x L20實例（[ecs.gn8is - 8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu - accelerated - compute - optimized - and - vgpu - accelerated - instance - families - 1)）進行了約5.5小時的訓練。

📄 許可證

我們根據Apache 2.0許可證共享此模型。

👨‍💻 開發者信息

該模型由Peter Devine (ptrdvn)為Lightblue訓練。

📋 信息表格

屬性	詳情
庫名稱	transformers
支持語言	英語、中文、西班牙語、德語、阿拉伯語、俄語、日語、韓語、印地語、斯洛伐克語、越南語、土耳其語、芬蘭語、印尼語、波斯語、挪威語、泰語、瑞典語、葡萄牙語、丹麥語、孟加拉語、泰盧固語、羅馬尼亞語、意大利語、法語、荷蘭語、斯瓦希里語、波蘭語、匈牙利語、捷克語、希臘語、烏克蘭語、馬拉地語、泰米爾語、他加祿語、保加利亞語、立陶宛語、烏爾都語、希伯來語、古吉拉特語、卡納達語、阿姆哈拉語、哈薩克語、克羅地亞語、烏茲別克語、爪哇語、加泰羅尼亞語、阿塞拜疆語、馬來語、塞爾維亞語、斯洛文尼亞語、約魯巴語、拉脫維亞語、冰島語、豪薩語、格魯吉亞語、愛沙尼亞語、波斯尼亞語、亞美尼亞語、馬拉雅拉姆語、旁遮普語、馬耳他語、高棉語、阿爾巴尼亞語、奧里亞語、阿薩姆語、緬甸語、蒙古語、南非荷蘭語、白俄羅斯語、愛爾蘭語、馬其頓語、威爾士語、加利西亞語、宿務語、拉丁語、意第緒語、盧森堡語、塔吉克語、蘇格蘭蓋爾語、尼泊爾語、普什圖語、巴斯克語、吉爾吉斯語、庫爾德語、僧伽羅語、海地克里奧爾語、世界語、老撾語、弗裡西語、信德語、馬達加斯加語、索馬里語、庫爾德語（中庫爾德語）、巽他語、挪威語（新挪威語）
數據集	lightblue/reranker_continuous_filt_max7_train
基礎模型	Qwen/Qwen2.5 - 0.5B - Instruct
任務類型	文本生成
標籤	reranker