🚀 文本排序交叉編碼器
本模型是將 webis/monoelectra-base 模型從 lightning-ir 遷移至 Sentence Transformers 和 Transformers 的版本。
該原始模型在論文 A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking 中被提出。有關訓練原始模型的代碼,請參閱 https://github.com/webis-de/rank-distillm。
此模型可在兩階段的“檢索 - 重排”管道中用作重排器,它會根據給定的查詢對檢索器模型(例如嵌入模型或 BM25)返回的段落進行重新排序。更多詳細信息請參閱 SBERT.net 檢索與重排。
🚀 快速開始
本模型可用於文本排序任務,能在“檢索 - 重排”管道中對檢索到的段落進行重排序,提升排序的準確性。
✨ 主要特性
📦 安裝指南
若你已安裝 SentenceTransformers,使用起來十分簡單。
pip install sentence-transformers
💻 使用示例
基礎用法(使用 Sentence Transformers)
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/monoelectra-base", trust_remote_code=True)
scores = model.predict([
("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
("How many people live in Berlin?", "Berlin is well known for its museums."),
])
print(scores)
高級用法(使用 Transformers)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/monoelectra-base", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/monoelectra-base")
features = tokenizer(
[
("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
("How many people live in Berlin?", "Berlin is well known for its museums."),
],
padding=True,
truncation=True,
return_tensors="pt",
)
model.eval()
with torch.no_grad():
scores = model(**features).logits.view(-1)
print(scores)
📄 許可證
本項目採用 Apache-2.0 許可證。