monoelectra-base開源文本排序模型 - 免費部署助力檢索結果重排序

首頁

Monoelectra Base

由cross-encoder開發

基於 ELECTRA 架構的文本排序交叉編碼器，用於檢索結果的重排序任務

文本嵌入

Transformers

英語開源協議:Apache-2.0 #檢索重排序 #ELECTRA架構 #段落相關性評分

下載量 151

發布時間 : 3/31/2025

模型概述

該模型是專為文本排序任務設計的交叉編碼器，可將檢索系統返回的段落根據查詢相關性進行重新排序，提升檢索效果。

模型特點

高效的交叉編碼架構

採用 ELECTRA 架構的判別器作為基礎，在排序任務上表現優異

檢索-重排流程優化

專為兩階段檢索系統設計，可與嵌入模型或 BM25 等檢索方法配合使用

基於大模型蒸餾

通過蒸餾方法從大語言模型獲得排序能力，平衡性能與效率

模型能力

查詢-段落相關性評分

檢索結果重排序

文本對分類

使用案例

信息檢索

搜索引擎結果優化

對搜索引擎返回的結果進行重新排序，提升最相關結果的排名

在論文實驗中顯示出優於基線模型的排序效果

問答系統增強

從候選答案中選擇最相關的結果

可有效識別與問題最匹配的答案段落

🚀 文本排序交叉編碼器

本模型是將 webis/monoelectra-base 模型從 lightning-ir 遷移至 Sentence Transformers 和 Transformers 的版本。

該原始模型在論文 A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking 中被提出。有關訓練原始模型的代碼，請參閱 https://github.com/webis-de/rank-distillm。

此模型可在兩階段的“檢索 - 重排”管道中用作重排器，它會根據給定的查詢對檢索器模型（例如嵌入模型或 BM25）返回的段落進行重新排序。更多詳細信息請參閱 SBERT.net 檢索與重排。

🚀 快速開始

本模型可用於文本排序任務，能在“檢索 - 重排”管道中對檢索到的段落進行重排序，提升排序的準確性。

✨ 主要特性

模型是從 lightning-ir 遷移至 Sentence Transformers 和 Transformers 的版本。
可作為兩階段“檢索 - 重排”管道中的重排器。

📦 安裝指南

若你已安裝 SentenceTransformers，使用起來十分簡單。

pip install sentence-transformers

💻 使用示例

基礎用法（使用 Sentence Transformers）

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/monoelectra-base", trust_remote_code=True)
scores = model.predict([
    ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
    ("How many people live in Berlin?", "Berlin is well known for its museums."),
])
print(scores)
# [ 8.122868 -4.292924]

高級用法（使用 Transformers）

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/monoelectra-base", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/monoelectra-base")

features = tokenizer(
    [
        ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
        ("How many people live in Berlin?", "Berlin is well known for its museums."),
    ],
    padding=True,
    truncation=True,
    return_tensors="pt",
)

model.eval()
with torch.no_grad():
    scores = model(**features).logits.view(-1)
print(scores)
# tensor([ 8.1229, -4.2929])