๐ Bloomz-3b Reranking
This reranking model measures the semantic correspondence between a question and a context, supporting both French and English. It helps filter and reorder results in ODQA contexts, though it has high computational cost.
โจ Features
- Built from the cmarkea/bloomz-3b-dpo-chat model.
- Language-agnostic, supporting both French and English.
- Can effectively score in a cross - language context.
- Helps filter and reorder results in an ODQA context.
๐ฆ Installation
No installation steps are provided in the original README.
๐ป Usage Examples
Basic Usage
from transformers import pipeline
reranker = pipeline(
task='text-classification',
model='cmarkea/bloomz-3b-reranking',
top_k=None
)
query: str
contexts: List[str]
similarities = reranker(
[
dict(
text=context,
text_pair=query
)
for context in contexts
]
)
score_label_1 = [
next(item['score'] for item in entry if item['label'] == 'LABEL_1')
for entry in similarities
]
contexts_reranked = sorted(
zip(score_label_1, contexts),
key=lambda x: x[0],
reverse=True
)
score, contexts_cleaned = zip(
*filter(
lambda x: x[0] >= 0.8,
contexts_reranked
)
)
๐ Documentation
Dataset
The training dataset is composed of the mMARCO dataset, consisting of query/positive/hard negative triplets. Additionally, we have included SQuAD data from the "train" split, forming query/positive/hard negative triplets. To generate hard negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. The negative observations belong to the same themes as the queries but presumably do not contain the answer to the question.
Finally, the triplets are flattened to obtain pairs of query/context sentences with a label 1 if query/positive and a label 0 if query/negative. In each element of the pair (query and context), the language, French or English, is randomly and uniformly chosen.
Evaluation
To assess the performance of the reranker, we use the "validation" split of the SQuAD dataset. We select the first question from each paragraph, along with the paragraph constituting the context that should be ranked Top - 1 for an Oracle modeling. The number of themes is limited, and each context from a corresponding theme that does not match the query is considered as a hard negative (other contexts outside the theme are simple negatives).
The evaluation corpus consists of 1204 pairs of query/context to be ranked.
Evaluation in the same language (French/French)
Evaluation in cross - language context (French/English)
As observed, the cross - language context does not significantly impact the behavior of our models. If the model were used in a context of reranking and filtering the Top - K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts for RAG - type applications.
๐ License
The model is licensed under the bigscience - bloom - rail - 1.0 license.
๐ Citation
@online{DeBloomzReranking,
AUTHOR = {Cyrile Delestre},
ORGANIZATION = {Cr{\'e}dit Mutuel Ark{\'e}a},
URL = {https://huggingface.co/cmarkea/bloomz-3b-reranking},
YEAR = {2024},
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
}