🚀 文本排序交叉编码器
本模型是将 webis/monoelectra-large 模型从 lightning-ir 迁移至 Sentence Transformers 和 Transformers 的版本。
原始模型在论文 A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking 中被提出。有关训练原始模型的代码,请参阅 https://github.com/webis-de/rank-distillm。
该模型可在两阶段的“检索 - 重排”管道中用作重排器,它会根据给定的查询对检索器模型(例如嵌入模型或 BM25)返回的段落进行重新排序。更多详细信息请参阅 SBERT.net 检索与重排。
🚀 快速开始
本模型可作为两阶段“检索 - 重排”管道中的重排器,依据查询对检索器模型返回的段落重新排序。
✨ 主要特性
📦 安装指南
若要使用该模型,需要安装 SentenceTransformers,可使用以下命令进行安装:
pip install sentence-transformers
💻 使用示例
基础用法
使用 Sentence Transformers
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/monoelectra-large", trust_remote_code=True)
scores = model.predict([
("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
("How many people live in Berlin?", "Berlin is well known for its museums."),
])
print(scores)
使用 Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/monoelectra-large", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/monoelectra-large")
features = tokenizer(
[
("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
("How many people live in Berlin?", "Berlin is well known for its museums."),
],
padding=True,
truncation=True,
return_tensors="pt",
)
model.eval()
with torch.no_grad():
scores = model(**features).logits.view(-1)
print(scores)
📄 许可证
本项目采用 Apache-2.0 许可证。