Mxbai Rerank Base V1
M
Mxbai Rerank Base V1
由 khoj-ai 开发
这是一个基于Transformers架构的重新排序(Reranker)模型,主要用于信息检索和搜索结果优化任务。
下载量 81
发布时间 : 1/21/2025
模型简介
该模型使用Transformers架构实现,能够对检索结果进行重新排序,提高搜索结果的相关性。适用于搜索引擎、问答系统等需要优化排序的场景。
模型特点
高效的重新排序能力
能够快速有效地对搜索结果进行重新排序,提高结果相关性
基于Transformers架构
利用先进的Transformer架构实现高质量的语义理解
支持Transformers.js
可以在浏览器环境中使用Transformers.js运行
模型能力
搜索结果重新排序
语义相关性评估
信息检索优化
使用案例
搜索引擎
搜索结果优化
对搜索引擎返回的结果进行重新排序,提高最相关结果的排名
提升用户搜索体验和点击率
问答系统
答案排序
对候选答案进行重新排序,将最可能的正确答案排在前面
提高问答系统的准确率
🚀 mxbai-rerank-base-v1
mxbai-rerank-base-v1 是强大的重排模型系列中的基础模型。它能有效提升搜索效果,与关键词搜索结合使用时表现出色,在许多情况下甚至能超越语义搜索系统。
🚀 快速开始
目前,使用我们模型的最佳方式是搭配最新版本的 sentence-transformers
。
pip install -U sentence-transformers
假设你有一个查询语句,并且想要对一组文档进行重排。你只需一行代码即可实现:
💻 使用示例
基础用法
from sentence_transformers import CrossEncoder
# 加载模型,这里我们使用基础大小的模型
model = CrossEncoder("mixedbread-ai/mxbai-rerank-base-v1")
# 示例查询和文档
query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# 获取得分
results = model.rank(query, documents, return_documents=True, top_k=3)
JavaScript 示例
npm i @xenova/transformers
假设你有一个查询语句,并且想要对一组文档进行重排。在 JavaScript 中,你需要添加一个函数:
import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';
const model_id = 'mixedbread-ai/mxbai-rerank-base-v1';
const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
/**
* 使用 CrossEncoder 对给定的查询和文档进行排名。返回一个排序后的列表,包含文档索引和得分。
* @param {string} query 单个查询
* @param {string[]} documents 文档列表
* @param {Object} options 排名选项
* @param {number} [options.top_k=undefined] 返回前 k 个文档。如果未定义,则返回所有文档。
* @param {boolean} [options.return_documents=false] 如果为 true,还返回文档。如果为 false,则仅返回索引和得分。
*/
async function rank(query, documents, {
top_k = undefined,
return_documents = false,
} = {}) {
const inputs = tokenizer(
new Array(documents.length).fill(query),
{
text_pair: documents,
padding: true,
truncation: true,
}
)
const { logits } = await model(inputs);
return logits
.sigmoid()
.tolist()
.map(([score], i) => ({
corpus_id: i,
score,
...(return_documents ? { text: documents[i] } : {})
}))
.sort((a, b) => b.score - a.score)
.slice(0, top_k);
}
// 示例用法:
const query = "Who wrote 'To Kill a Mockingbird'?"
const documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);
📦 使用 API
你可以通过我们的 API 使用大型模型,如下所示:
from mixedbread_ai.client import MixedbreadAI
mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")
res = mxbai.reranking(
model="mixedbread-ai/mxbai-rerank-large-v1",
query="Who is the author of To Kill a Mockingbird?",
input=[
"To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
],
top_k=3,
return_input=false
)
print(res.data)
该 API 还具备其他功能,例如持续训练的重排器!更多信息请查看 文档。
🔧 评估
我们的重排模型旨在提升你的搜索体验。它们与关键词搜索结合使用时效果极佳,在许多情况下甚至能超越语义搜索系统。
模型 | NDCG@10 | Accuracy@3 |
---|---|---|
词法搜索 (Lucene) | 38.0 | 66.4 |
BAAI/bge-reranker-base | 41.6 | 66.9 |
BAAI/bge-reranker-large | 45.2 | 70.6 |
cohere-embed-v3 (语义搜索) | 47.5 | 70.9 |
mxbai-rerank-xsmall-v1 | 43.9 | 70.0 |
mxbai-rerank-base-v1 | 46.9 | 72.3 |
mxbai-rerank-large-v1 | 48.8 | 74.9 |
报告的结果是从 BEIR 的 11 个数据集中汇总得出的。我们使用 Pyserini 对模型进行评估。更多信息请查看我们的 博客文章 和 电子表格。
👥 社区
请加入我们的 Discord 社区,分享你的反馈和想法!我们随时准备提供帮助,也乐于与你交流。
📄 许可证
本项目采用 Apache 2.0 许可证。
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98