lb-reranker-0.5B-v1.0 Open Source Model - Supports Query and Text Relevance Judgment and Retrieval Sorting in 95+ Languages

Lb Reranker 0.5B V1.0

Developed by lightblue

The LB Reranker is a model for determining the relevance between queries and text snippets, supporting 95+ languages, suitable for ranking and reranking in retrieval tasks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual Reranking #Retrieval-Augmented Generation #Low-Latency Inference

Downloads 917

Release Time : 1/6/2025

Model Overview

A lightweight reranking model fine-tuned based on Qwen2.5-0.5B-Instruct, optimizing retrieval result ranking by outputting relevance scores from 1 to 7.

Model Features

Multilingual Support

Training covers 95+ languages, making it one of the most widely supported rerankers available.

Strong Compatibility

Outputs are numeric strings from 1 to 7, directly compatible with mainstream inference frameworks like vLLM/LMDeploy.

Efficient Inference

Outperforms similar models in the BEIR benchmark while achieving faster inference speeds.

Code Ranking Capability

Achieves 96% P@1 accuracy in code snippet reranking tasks.

Model Capabilities

Query-Text Relevance Scoring

Multilingual Retrieval Optimization

Code Snippet Ranking

Large-Scale Document Retrieval

Use Cases

Information Retrieval

Search Engine Result Optimization

Reranking the relevance of documents returned by search engines.

Outperforms baseline models like BGE in the BEIR benchmark.

Code Retrieval

Code Snippet Ranking

Ranking the relevance of code repository retrieval results.

Achieves 96% P@1 accuracy.

🚀 LB Reranker v1.0

The LB Reranker is designed to determine the relatedness between a given query and a text, making it a valuable tool as a ranker or reranker in various retrieval-based tasks. It has been trained on over 95 languages, offering broad applicability across different use cases.

🚀 Quick Start

The model expects an input in the following format:

<<<Query>>>
{your_query_here}

<<<Context>>>
{your_context_here}

And it outputs a string of a number between 1 - 7. To obtain a continuous score for reranking query - context pairs, we calculate the expectation value of the scores.

✨ Features

High Performance: Shows slightly higher performance on evaluation benchmarks.
Multilingual Support: Trained on more languages than any previous model.
Easy Integration: A simple Causal LM model trained to output a string between "1" and "7", allowing native use with many widely available inference packages like vLLM and LMDeploy.

📦 Installation

vLLM

Install vLLM using pip install vllm.

LMDeploy

Install LMDeploy using pip install lmdeploy.

OpenAI (Hosted on Huggingface)

Install openai using pip install openai.

💻 Usage Examples

Basic Usage

The following shows how to use the LB Reranker in different inference packages:

vLLM

from vllm import LLM, SamplingParams
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0

llm = LLM("lightblue/lb-reranker-v1.0")
sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
tok = llm.llm_engine.tokenizer.tokenizer
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = llm.chat(chats, sampling_params)
probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66570732 1.86686378 1.01102923]

LMDeploy

# Un-comment this if running in a Jupyter notebook, Colab etc.
# import nest_asyncio
# nest_asyncio.apply()

from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0

pipe = pipeline(
    "lightblue/lb-reranker-v1.0",
    chat_template_config=ChatTemplateConfig(
                    model_name='qwen2d5',
                    capability='chat'
    )
)
tok = pipe.tokenizer.model
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = pipe(
    chats, 
    gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
)
probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66415229 1.84342025 1.01133205]

OpenAI (Hosted on Huggingface)

from openai import OpenAI
import numpy as np
from multiprocessing import Pool
from tqdm.auto import tqdm

client = OpenAI(
    base_url="https://api-inference.huggingface.co/v1/",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Change this to an access token from https://huggingface.co/settings/tokens
)

def make_reranker_input(t, q):
    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_reranker_score(context_question_tuple):
    question, context = context_question_tuple

    messages = make_reranker_inference_conversation(context, question)

    completion = client.chat.completions.create(
        model="lightblue/lb-reranker-0.5B-v1.0", 
        messages=messages,
        max_tokens=1,
        temperature=0.0,
        logprobs=True,
        top_logprobs=5, # Max allowed by the openai API as top_n_tokens must be >= 0 and <= 5. If this gets changed, fix to > 7.
    )

    logprobs = completion.choices[0].logprobs.content[0].top_logprobs

    calculated_score = sum([int(x.token) * np.exp(x.logprob) for x in logprobs])

    return calculated_score

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

with Pool(processes=16) as p: # Allows for parallel processing
    expected_vals = list(tqdm(p.imap(get_reranker_score, query_texts), total=len(query_texts)))

print(expected_vals)
# [6.64866580, 1.85144404, 1.010719508]

📚 Documentation

Model Details

Base Model: Qwen/Qwen2.5 - 0.5B - Instruct
Training Data: lightblue/reranker_continuous_filt_max7_train
Code Repository: our Github repo

Evaluation

We evaluated the model on 9 datasets from the BEIR benchmark that none of the evaluated models have been trained on (to our knowledge):

Arguana
Dbpedia - entity
Fiqa
NFcorpus
Scidocs
Scifact
Trec - covid - v2
Vihealthqa
Webis - touche2020

We evaluated on a subset of all queries (the first 250) to save evaluation time. The evaluation code and results are available on our Github.

image/png

🔧 Technical Details

The LB Reranker is fine - tuned from a Qwen/Qwen2.5 - 0.5B - Instruct model checkpoint. It was trained for roughly 5.5 hours using the 8 x L20 instance (ecs.gn8is - 8x.32xlarge) on Alibaba Cloud.

📄 License

We share this model under an Apache 2.0 license.

Developed by

This model was trained by Peter Devine (ptrdvn) for Lightblue

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご