BAAI Bge Reranker V2 Gemma Gguf

Developed by RichardErkhov

A multilingual reranking model based on Gemma-2B, suitable for text relevance ranking tasks and supports multilingual scenarios.

Text Embedding #Multilingual Reranking #Lightweight Deployment #Gemma Base

Downloads 1,482

Release Time : 10/7/2024

Model Overview

This model directly takes questions and documents as input and outputs similarity scores instead of embedding vectors, making it suitable for text reranking tasks.

Model Features

Multilingual Support

Suitable for multilingual scenarios, excelling in both English and multilingual capabilities.

Lightweight Design

The model is designed to be lightweight, easy to deploy, and fast in inference.

Direct Similarity Output

Directly outputs relevance scores between queries and documents without additional calculations.

Model Capabilities

Text Relevance Scoring

Multilingual Text Processing

Document Reranking

Use Cases

Information Retrieval

Search Engine Result Ranking

Rerank search engine results to improve relevance.

Enhances the relevance of search results and user satisfaction.

Recommendation Systems

Content Recommendation

Rank recommended content by relevance to improve recommendation quality.

Increases the accuracy of recommended content and user click-through rates.

🚀 Sentence-Transformers

Sentence-Transformers is a library for text ranking, offering reranker models that can directly output similarity scores based on input questions and documents.

🚀 Quick Start

The reranker in this library uses questions and documents as input and directly outputs similarity scores instead of embeddings. You can obtain a relevance score by inputting a query and a passage into the reranker, and this score can be mapped to a float value in the range of [0, 1] using the sigmoid function.

✨ Features

Different from embedding models, it directly outputs similarity scores.
Supports multiple languages.
Offers various models suitable for different scenarios and resources.

📦 Installation

Using FlagEmbedding

pip install -U FlagEmbedding

💻 Usage Examples

Basic Usage

For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 )

from FlagEmbedding import FlagReranker
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'])
print(score) # -5.65234375

# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
score = reranker.compute_score(['query', 'passage'], normalize=True)
print(score) # 0.003497010252573502

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores) # [-8.1875, 5.26171875]

# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True)
print(scores) # [0.00027803096387751553, 0.9948403768236574]

For LLM-based reranker

from FlagEmbedding import FlagLLMReranker
reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
# reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', use_bf16=True) # You can also set use_bf16=True to speed up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'])
print(score)

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores)

For LLM-based layerwise reranker

from FlagEmbedding import LayerWiseFlagLLMReranker
reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
# reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_bf16=True) # You can also set use_bf16=True to speed up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
print(score)

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28])
print(scores)

Advanced Usage

Using Huggingface transformers

For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 )

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-v2-m3')
model.eval()

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
    scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
    print(scores)

For LLM-based reranker

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
    if prompt is None:
        prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'."
    sep = "\n"
    prompt_inputs = tokenizer(prompt,
                              return_tensors=None,
                              add_special_tokens=False)['input_ids']
    sep_inputs = tokenizer(sep,
                           return_tensors=None,
                           add_special_tokens=False)['input_ids']
    inputs = []
    for query, passage in pairs:
        query_inputs = tokenizer(f'A: {query}',
                                 return_tensors=None,
                                 add_special_tokens=False,
                                 max_length=max_length * 3 // 4,
                                 truncation=True)
        passage_inputs = tokenizer(f'B: {passage}',
                                   return_tensors=None,
                                   add_special_tokens=False,

📚 Documentation

Model List

Model	Base model	Language	layerwise	feature
BAAI/bge-reranker-base	xlm-roberta-base	Chinese and English	-	Lightweight reranker model, easy to deploy, with fast inference.
BAAI/bge-reranker-large	xlm-roberta-large	Chinese and English	-	Lightweight reranker model, easy to deploy, with fast inference.
BAAI/bge-reranker-v2-m3	bge-m3	Multilingual	-	Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference.
BAAI/bge-reranker-v2-gemma	gemma-2b	Multilingual	-	Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities.
BAAI/bge-reranker-v2-minicpm-layerwise	MiniCPM-2B-dpo-bf16	Multilingual	8-40	Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference.

You can select the model according to your scenario and resources:

For multilingual scenarios, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-gemma.
For Chinese or English scenarios, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-minicpm-layerwise.
For efficiency, utilize BAAI/bge-reranker-v2-m3 and the low layer of BAAI/bge-reranker-v2-minicpm-layerwise.
For better performance, recommend BAAI/bge-reranker-v2-minicpm-layerwise and BAAI/bge-reranker-v2-gemma.

Quantization Information

Quantization made by Richard Erkhov.

Github

Discord

Request more models

bge-reranker-v2-gemma - GGUF

Model creator: https://huggingface.co/BAAI/
Original model: https://huggingface.co/BAAI/bge-reranker-v2-gemma/

Name	Quant method	Size
bge-reranker-v2-gemma.Q2_K.gguf	Q2_K	1.08GB
bge-reranker-v2-gemma.IQ3_XS.gguf	IQ3_XS	1.16GB
bge-reranker-v2-gemma.IQ3_S.gguf	IQ3_S	1.2GB
bge-reranker-v2-gemma.Q3_K_S.gguf	Q3_K_S	1.2GB
bge-reranker-v2-gemma.IQ3_M.gguf	IQ3_M	1.22GB
bge-reranker-v2-gemma.Q3_K.gguf	Q3_K	1.29GB
bge-reranker-v2-gemma.Q3_K_M.gguf	Q3_K_M	1.29GB
bge-reranker-v2-gemma.Q3_K_L.gguf	Q3_K_L	1.36GB
bge-reranker-v2-gemma.IQ4_XS.gguf	IQ4_XS	1.4GB
bge-reranker-v2-gemma.Q4_0.gguf	Q4_0	1.44GB
bge-reranker-v2-gemma.IQ4_NL.gguf	IQ4_NL	1.45GB
bge-reranker-v2-gemma.Q4_K_S.gguf	Q4_K_S	1.45GB
bge-reranker-v2-gemma.Q4_K.gguf	Q4_K	1.52GB
bge-reranker-v2-gemma.Q4_K_M.gguf	Q4_K_M	1.52GB
bge-reranker-v2-gemma.Q4_1.gguf	Q4_1	1.56GB
bge-reranker-v2-gemma.Q5_0.gguf	Q5_0	1.68GB
bge-reranker-v2-gemma.Q5_K_S.gguf	Q5_K_S	1.68GB
bge-reranker-v2-gemma.Q5_K.gguf	Q5_K	1.71GB
bge-reranker-v2-gemma.Q5_K_M.gguf	Q5_K_M	1.71GB
bge-reranker-v2-gemma.Q5_1.gguf	Q5_1	1.79GB
bge-reranker-v2-gemma.Q6_K.gguf	Q6_K	1.92GB
bge-reranker-v2-gemma.Q8_0.gguf	Q8_0	2.49GB

Original Model Description

license: apache-2.0 pipeline_tag: text-classification tags:

transformers
sentence-transformers language:
multilingual

Reranker

More details please refer to our Github: FlagEmbedding.

Model List
Usage
Fine-tuning
Evaluation
Citation

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご