Mxbai Rerank Base V1
M
Mxbai Rerank Base V1
由khoj-ai開發
這是一個基於Transformers架構的重新排序(Reranker)模型,主要用於信息檢索和搜索結果優化任務。
下載量 81
發布時間 : 1/21/2025
模型概述
該模型使用Transformers架構實現,能夠對檢索結果進行重新排序,提高搜索結果的相關性。適用於搜索引擎、問答系統等需要優化排序的場景。
模型特點
高效的重新排序能力
能夠快速有效地對搜索結果進行重新排序,提高結果相關性
基於Transformers架構
利用先進的Transformer架構實現高質量的語義理解
支持Transformers.js
可以在瀏覽器環境中使用Transformers.js運行
模型能力
搜索結果重新排序
語義相關性評估
信息檢索優化
使用案例
搜索引擎
搜索結果優化
對搜索引擎返回的結果進行重新排序,提高最相關結果的排名
提升用戶搜索體驗和點擊率
問答系統
答案排序
對候選答案進行重新排序,將最可能的正確答案排在前面
提高問答系統的準確率
🚀 mxbai-rerank-base-v1
mxbai-rerank-base-v1 是強大的重排模型系列中的基礎模型。它能有效提升搜索效果,與關鍵詞搜索結合使用時表現出色,在許多情況下甚至能超越語義搜索系統。
🚀 快速開始
目前,使用我們模型的最佳方式是搭配最新版本的 sentence-transformers
。
pip install -U sentence-transformers
假設你有一個查詢語句,並且想要對一組文檔進行重排。你只需一行代碼即可實現:
💻 使用示例
基礎用法
from sentence_transformers import CrossEncoder
# 加載模型,這裡我們使用基礎大小的模型
model = CrossEncoder("mixedbread-ai/mxbai-rerank-base-v1")
# 示例查詢和文檔
query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# 獲取得分
results = model.rank(query, documents, return_documents=True, top_k=3)
JavaScript 示例
npm i @xenova/transformers
假設你有一個查詢語句,並且想要對一組文檔進行重排。在 JavaScript 中,你需要添加一個函數:
import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';
const model_id = 'mixedbread-ai/mxbai-rerank-base-v1';
const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
/**
* 使用 CrossEncoder 對給定的查詢和文檔進行排名。返回一個排序後的列表,包含文檔索引和得分。
* @param {string} query 單個查詢
* @param {string[]} documents 文檔列表
* @param {Object} options 排名選項
* @param {number} [options.top_k=undefined] 返回前 k 個文檔。如果未定義,則返回所有文檔。
* @param {boolean} [options.return_documents=false] 如果為 true,還返回文檔。如果為 false,則僅返回索引和得分。
*/
async function rank(query, documents, {
top_k = undefined,
return_documents = false,
} = {}) {
const inputs = tokenizer(
new Array(documents.length).fill(query),
{
text_pair: documents,
padding: true,
truncation: true,
}
)
const { logits } = await model(inputs);
return logits
.sigmoid()
.tolist()
.map(([score], i) => ({
corpus_id: i,
score,
...(return_documents ? { text: documents[i] } : {})
}))
.sort((a, b) => b.score - a.score)
.slice(0, top_k);
}
// 示例用法:
const query = "Who wrote 'To Kill a Mockingbird'?"
const documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);
📦 使用 API
你可以通過我們的 API 使用大型模型,如下所示:
from mixedbread_ai.client import MixedbreadAI
mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")
res = mxbai.reranking(
model="mixedbread-ai/mxbai-rerank-large-v1",
query="Who is the author of To Kill a Mockingbird?",
input=[
"To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
],
top_k=3,
return_input=false
)
print(res.data)
該 API 還具備其他功能,例如持續訓練的重排器!更多信息請查看 文檔。
🔧 評估
我們的重排模型旨在提升你的搜索體驗。它們與關鍵詞搜索結合使用時效果極佳,在許多情況下甚至能超越語義搜索系統。
模型 | NDCG@10 | Accuracy@3 |
---|---|---|
詞法搜索 (Lucene) | 38.0 | 66.4 |
BAAI/bge-reranker-base | 41.6 | 66.9 |
BAAI/bge-reranker-large | 45.2 | 70.6 |
cohere-embed-v3 (語義搜索) | 47.5 | 70.9 |
mxbai-rerank-xsmall-v1 | 43.9 | 70.0 |
mxbai-rerank-base-v1 | 46.9 | 72.3 |
mxbai-rerank-large-v1 | 48.8 | 74.9 |
報告的結果是從 BEIR 的 11 個數據集中彙總得出的。我們使用 Pyserini 對模型進行評估。更多信息請查看我們的 博客文章 和 電子表格。
👥 社區
請加入我們的 Discord 社區,分享你的反饋和想法!我們隨時準備提供幫助,也樂於與你交流。
📄 許可證
本項目採用 Apache 2.0 許可證。
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98