RankZephyr 7B V1 Full Open-source Language Model - Free Deployment for Implementing List Reordering

Rank Zephyr 7b V1 Full

Developed by castorini

RankZephyr is a language model trained based on Zephyr-7B-β, specializing in listwise reranking tasks and excelling across multiple datasets.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Zero-shot Reranking #Open-source Ranking Model #Multi-dataset Optimization

Downloads 3,107

Release Time : 1/19/2024

Model Overview

This model is a 7B-parameter GPT-like model specifically optimized for listwise reranking tasks, serving as an efficient reranking assistant.

Model Features

Advanced Reranking Performance

Achieves leading performance among open-source reranking models on multiple datasets including DL19/20/21/22, TREC-COVID, and TREC-News.

Two-stage Fine-tuning

First fine-tuned on RankGPT-3.5 model data, then further optimized on 5K query reranking data from RankGPT-4 for Ada2 ranking.

Open-source Availability

Released under the MIT license, freely usable and modifiable.

Model Capabilities

Document Reranking

Query Result Optimization

Listwise Ranking

Use Cases

Information Retrieval

Search Engine Result Optimization

Reranks search engine results to improve relevance.

Performs excellently on the MS MARCO v1 collection.

🚀 Model Card for RankZephyr 7B V1 - Full

RankZephyr is a series of language models designed as helpful reranking assistants, built upon the Zephyr - 7B - β model. The RankZephyr Base model undergoes single - stage fine - tuning on the RankGPT - 3.5 model. Meanwhile, RankZephyr Full is further fine - tuned on RankGPT - 4 reorderings of OpenAI's Ada2 orderings for 5K queries.

🚀 Quick Start

The model is to be used in conjunction with the RankLLM repository. While rank - llm exists as a PyPI package, we are currently in the early stages of development and encourage users to directly check install from source.

✨ Features

RankZephyr-7B-Full is the state - of - the - art open - source reranking model on various datasets like DL19/20/21/22, TREC - COVID, and TREC - News at the time of release.
It is fine - tuned to act as a listwise reranking agent, taking a query and documents and returning a reordered list of document identifiers.

📦 Installation

The README does not provide specific installation steps, so this section is skipped.

📚 Documentation

Model description

Property	Details
Model Type	A 7B parameter GPT - like model initially fine - tuned on a mix of publicly available, synthetic datasets, followed by task - specific listwise reranking data.
Language(s) (NLP)	Primarily English
License	MIT
Fine - tuned from model	[HuggingFaceH4/zephyr - 7b - beta](https://huggingface.co/HuggingFaceH4/zephyr - 7b - beta)

Model Sources

Repository: https://github.com/castorini/rank_llm
Paper: https://arxiv.org/abs/2312.02724

Effectiveness

At the time of release, RankZephyr - 7B - Full is the state - of - the - art open - source reranking model on various datasets like DL19/20/21/22 and TREC - COVID and TREC - News.

With the MS MARCO v1 collection:

Model	Size	First Stage	DL19	DL20
RankZephyr - 7b - v1 - full - rho 🪁	7B	SPLADE++ ED	0.7855	0.8255
RankZephyr - 7b - v1 - full 🪁	7B	SPLADE++ ED	0.7803	0.8211
RankGPT - 4 (PSC)	-	SPLADE++ ED	0.7601	0.7514
RankGPT - 4	-	SPLADE++ ED	0.7464	0.7076
RankZephyr - 7b - v1 - base 🪁	7B	SPLADE++ ED	0.7341	0.7213
RankGPT - 3.5	-	SPLADE++ ED	0.7504	0.7120

More details can be found in the paper.

Intended uses & limitations

The original Zephyr model is trained for chat. In our case, RankZephyr is fine - tuned to act as a listwise reranking agent. You provide it with a query and documents and get back a reordered list of document identifiers.

Bias, Risks, and Limitations

⚠️ Important Note

Zephyr - 7B - β has not been aligned to human preferences for safety within the RLHF phase or deployed with in - the - loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base model (mistralai/Mistral - 7B - v0.1), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon - 180B#training - data) for an example of this.

💡 Usage Tip

Our model is trained specifically on monolingual English data, effectiveness on multilingual sets is not guaranteed.

Citation

If you find RankZephyr is useful in your work, please cite the following paper:

@ARTICLE{pradeep2023rankzephyr,
  title   = {{RankZephyr}: Effective and Robust Zero - Shot Listwise Reranking is a Breeze!},
  author  = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
  year    = {2023},
  journal = {arXiv:2312.02724}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご