reranker - MiniLMオープンソースクロスエンコーダモデル - テキストの再ランキングと意味検索に無料で利用可能

ホーム

Reranker MiniLM L6 H384 Uncased Gooaq 5 Epoch 1995000

ayushexelによって開発

これはnreimers/MiniLM-L6-H384-uncasedから微調整されたクロスエンコーダモデルで、テキストペアのスコアを計算するために使用され、テキストの再ランキングや意味的検索タスクに適しています。

テキスト埋め込み

Safetensors

英語オープンソースライセンス:Apache-2.0 #テキストの再ランキング #質問応答のマッチング #高精度な意味的評点

ダウンロード数 24

リリース時間 : 3/31/2025

モデル概要

このモデルはクロスエンコーダで、テキストペアの類似度スコアを計算するために特別に設計されており、情報検索や質問応答システムなどのシナリオでのテキストの再ランキングタスクに適用できます。

モデル特徴

効率的なテキストの再ランキング

テキストペアの類似度スコアを正確に計算でき、検索システムのランキング品質を効果的に向上させます

MiniLMアーキテクチャをベースとしています

軽量なMiniLMアーキテクチャを採用し、性能を維持しながら推論効率を向上させます

複数のデータセットでの検証

複数のデータセット（gooaq、NanoMSMARCOなど）で検証され、安定した性能を発揮します

モデル能力

テキストの類似度計算

意味的検索

質問応答システムの再ランキング

情報検索の最適化

使用事例

情報検索

検索エンジン結果の再ランキング

検索エンジンが返した結果を再ランキングし、最も関連性の高い結果のランクを上げます

gooaq開発セットで0.5149のNDCG@10を達成しました

質問応答システム

候補回答のランキング

質問応答システムが生成した複数の候補回答を関連性でランキングします

NanoNQデータセットで0.4065のNDCG@10を達成しました

🚀 nreimers/MiniLM-L6-H384-uncasedをベースとしたCrossEncoder

このモデルは、sentence-transformersライブラリを使用して、nreimers/MiniLM-L6-H384-uncasedからファインチューニングされたCross Encoderモデルです。テキストペアのスコアを計算し、テキストの再ランキングや意味検索に使用できます。

🚀 クイックスタート

まずは、Sentence Transformersライブラリをインストールします。

pip install -U sentence-transformers

次に、このモデルをロードして推論を実行できます。

from sentence_transformers import CrossEncoder

# 🤗 Hubからダウンロード
model = CrossEncoder("ayushexel/reranker-MiniLM-L6-H384-uncased-gooaq-5-epoch-1995000")
# テキストペアのスコアを取得
pairs = [
    ['when is the 2020 democratic presidential debate?', 'Major candidates The nomination will be made official at the 2020 Democratic National Convention, tentatively scheduled for August 17–20, 2020 in Milwaukee, Wisconsin.'],
    ['when is the 2020 democratic presidential debate?', 'Major candidates As of June 8, 2020, former Vice President Joe Biden became the presumptive presidential nominee by amassing enough delegates to secure the nomination.'],
    ['when is the 2020 democratic presidential debate?', 'On March 5, 2019, Bloomberg announced that he would not run for president in 2020; instead he encouraged the Democratic Party to "nominate a Democrat who will be in the strongest position to defeat Donald Trump".'],
    ['when is the 2020 democratic presidential debate?', 'The electoral map for the 2020 election, based on populations from the 2010 Census. The 2020 United States presidential election is scheduled for Tuesday, November 3, 2020. It will be the 59th quadrennial presidential election.'],
    ['when is the 2020 democratic presidential debate?', 'There were a total of 29 major Democratic candidates. Of these, 23 candidates participated in at least one debate. Only Joe Biden and Bernie Sanders participated in all the debates; Pete Buttigieg, Amy Klobuchar, and Elizabeth Warren participated in all but one debate.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# または、単一のテキストに対する類似度に基づいて異なるテキストをランク付けする
ranks = model.rank(
    'when is the 2020 democratic presidential debate?',
    [
        'Major candidates The nomination will be made official at the 2020 Democratic National Convention, tentatively scheduled for August 17–20, 2020 in Milwaukee, Wisconsin.',
        'Major candidates As of June 8, 2020, former Vice President Joe Biden became the presumptive presidential nominee by amassing enough delegates to secure the nomination.',
        'On March 5, 2019, Bloomberg announced that he would not run for president in 2020; instead he encouraged the Democratic Party to "nominate a Democrat who will be in the strongest position to defeat Donald Trump".',
        'The electoral map for the 2020 election, based on populations from the 2010 Census. The 2020 United States presidential election is scheduled for Tuesday, November 3, 2020. It will be the 59th quadrennial presidential election.',
        'There were a total of 29 major Democratic candidates. Of these, 23 candidates participated in at least one debate. Only Joe Biden and Bernie Sanders participated in all the debates; Pete Buttigieg, Amy Klobuchar, and Elizabeth Warren participated in all but one debate.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

✨ 主な機能

テキストペアのスコアを計算し、テキストの再ランキングや意味検索に利用可能です。
最大シーケンス長は512トークンです。
出力ラベルは1つです。

📦 インストール

まずは、Sentence Transformersライブラリをインストールします。

pip install -U sentence-transformers

💻 使用例

基本的な使用法

from sentence_transformers import CrossEncoder

# 🤗 Hubからダウンロード
model = CrossEncoder("ayushexel/reranker-MiniLM-L6-H384-uncased-gooaq-5-epoch-1995000")
# テキストペアのスコアを取得
pairs = [
    ['when is the 2020 democratic presidential debate?', 'Major candidates The nomination will be made official at the 2020 Democratic National Convention, tentatively scheduled for August 17–20, 2020 in Milwaukee, Wisconsin.'],
    ['when is the 2020 democratic presidential debate?', 'Major candidates As of June 8, 2020, former Vice President Joe Biden became the presumptive presidential nominee by amassing enough delegates to secure the nomination.'],
    ['when is the 2020 democratic presidential debate?', 'On March 5, 2019, Bloomberg announced that he would not run for president in 2020; instead he encouraged the Democratic Party to "nominate a Democrat who will be in the strongest position to defeat Donald Trump".'],
    ['when is the 2020 democratic presidential debate?', 'The electoral map for the 2020 election, based on populations from the 2010 Census. The 2020 United States presidential election is scheduled for Tuesday, November 3, 2020. It will be the 59th quadrennial presidential election.'],
    ['when is the 2020 democratic presidential debate?', 'There were a total of 29 major Democratic candidates. Of these, 23 candidates participated in at least one debate. Only Joe Biden and Bernie Sanders participated in all the debates; Pete Buttigieg, Amy Klobuchar, and Elizabeth Warren participated in all but one debate.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# または、単一のテキストに対する類似度に基づいて異なるテキストをランク付けする
ranks = model.rank(
    'when is the 2020 democratic presidential debate?',
    [
        'Major candidates The nomination will be made official at the 2020 Democratic National Convention, tentatively scheduled for August 17–20, 2020 in Milwaukee, Wisconsin.',
        'Major candidates As of June 8, 2020, former Vice President Joe Biden became the presumptive presidential nominee by amassing enough delegates to secure the nomination.',
        'On March 5, 2019, Bloomberg announced that he would not run for president in 2020; instead he encouraged the Democratic Party to "nominate a Democrat who will be in the strongest position to defeat Donald Trump".',
        'The electoral map for the 2020 election, based on populations from the 2010 Census. The 2020 United States presidential election is scheduled for Tuesday, November 3, 2020. It will be the 59th quadrennial presidential election.',
        'There were a total of 29 major Democratic candidates. Of these, 23 candidates participated in at least one debate. Only Joe Biden and Bernie Sanders participated in all the debates; Pete Buttigieg, Amy Klobuchar, and Elizabeth Warren participated in all but one debate.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

📚 ドキュメント

モデル詳細

属性	详情
モデルタイプ	Cross Encoder
ベースモデル	nreimers/MiniLM-L6-H384-uncased
最大シーケンス長	512トークン
出力ラベル数	1ラベル
言語	en
ライセンス	apache-2.0

モデル情報源

ドキュメント: Sentence Transformers Documentation
ドキュメント: Cross Encoder Documentation
リポジトリ: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

🔧 技術詳細

トレーニングデータセット

無名データセット

サイズ: 11,456,701トレーニングサンプル
列: question、answer、およびlabel
最初の1000サンプルに基づく概算統計情報:
question answer label
タイプ string string int
詳細
最小: 18文字
平均: 43.15文字
最大: 83文字
最小: 59文字
平均: 257.34文字
最大: 388文字
0: ~82.40%
1: ~17.60%

	question	answer	label
タイプ	string	string	int
詳細	最小: 18文字平均: 43.15文字最大: 83文字	最小: 59文字平均: 257.34文字最大: 388文字	0: ~82.40% 1: ~17.60%

サンプル:

question	answer	label
`when is the 2020 democratic presidential debate?`	`Major candidates The nomination will be made official at the 2020 Democratic National Convention, tentatively scheduled for August 17–20, 2020 in Milwaukee, Wisconsin.`	`1`
`when is the 2020 democratic presidential debate?`	`Major candidates As of June 8, 2020, former Vice President Joe Biden became the presumptive presidential nominee by amassing enough delegates to secure the nomination.`	`0`
`when is the 2020 democratic presidential debate?`	`On March 5, 2019, Bloomberg announced that he would not run for president in 2020; instead he encouraged the Democratic Party to "nominate a Democrat who will be in the strongest position to defeat Donald Trump".`	`0`

損失関数: BinaryCrossEntropyLoss 、以下のパラメータを使用:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": 5
}

トレーニングハイパーパラメータ

デフォルト以外のハイパーパラメータ

eval_strategy: steps
per_device_train_batch_size: 256
per_device_eval_batch_size: 256
learning_rate: 2e-05
num_train_epochs: 5
warmup_ratio: 0.1
seed: 12
bf16: True
dataloader_num_workers: 12
load_best_model_at_end: True

評価

メトリクス

Cross Encoder Reranking

データセット: gooaq-dev
評価には CrossEncoderRerankingEvaluator を使用し、以下のパラメータを設定:
```
{
    "at_k": 10,
    "always_rerank_positives": false
}
```

メトリクス	値
map	0.4719 (+0.2021)
mrr@10	0.4714 (+0.2125)
ndcg@10	0.5149 (+0.2052)

Cross Encoder Reranking

データセット: NanoMSMARCO_R100、NanoNFCorpus_R100およびNanoNQ_R100
評価には CrossEncoderRerankingEvaluator を使用し、以下のパラメータを設定:
```
{
    "at_k": 10,
    "always_rerank_positives": true
}
```

メトリクス	NanoMSMARCO_R100	NanoNFCorpus_R100	NanoNQ_R100
map	0.3405 (-0.1491)	0.3375 (+0.0765)	0.3251 (-0.0945)
mrr@10	0.3251 (-0.1524)	0.5157 (+0.0159)	0.3406 (-0.0861)
ndcg@10	0.4090 (-0.1314)	0.3596 (+0.0346)	0.4065 (-0.0942)

Cross Encoder Nano BEIR

データセット: NanoBEIR_R100_mean

評価には CrossEncoderNanoBEIREvaluator を使用し、以下のパラメータを設定:

{
    "dataset_names": [
        "msmarco",
        "nfcorpus",
        "nq"
    ],
    "rerank_k": 100,
    "at_k": 10,
    "always_rerank_positives": true
}

メトリクス	値
map	0.3344 (-0.0557)
mrr@10	0.3938 (-0.0742)
ndcg@10	0.3917 (-0.0637)

フレームワークバージョン

Python: 3.11.0
Sentence Transformers: 4.0.1
Transformers: 4.50.3
PyTorch: 2.6.0+cu124
Accelerate: 1.5.2
Datasets: 3.5.0
Tokenizers: 0.21.1

📄 ライセンス

このモデルは、apache-2.0ライセンスの下で提供されています。

引用

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}