bge-base-financial-matryoshka_test_1オープンソースモデル - 無料で英語の金融情報検索と文章類似度計算を実現

ホーム

Bge Base Financial Matryoshka Test 1

NickyNickyによって開発

BAAI/bge-base-en-v1.5を基にした金融分野最適化モデルで、英文の文類似度計算と情報検索タスクに特化しています。

テキスト埋め込み

Safetensors

英語オープンソースライセンス:Apache-2.0 #金融文書検索 #多次元特徴抽出 #高精度類似度マッチング

ダウンロード数 15

リリース時間 : 6/30/2024

モデル概要

このモデルは文埋め込みモデルで、主に文間の類似度計算に使用され、特に金融分野のテキストに最適化されています。特徴抽出と情報検索タスクをサポートします。

モデル特徴

金融分野最適化

金融分野のテキストに対して特別な訓練と最適化を実施

多次元埋め込み

様々な埋め込み次元(768/512/256/128/64)をサポートし、異なるアプリケーションシナリオに対応

効率的な検索

情報検索タスク、特に金融文書検索において優れた性能を発揮

二重損失関数訓練

MatryoshkaLossとMultipleNegativesRankingLossを組み合わせた共同訓練を採用

モデル能力

文類似度計算

金融文書検索

特徴抽出

意味検索

使用事例

金融文書処理

財務報告書検索

大量の財務報告書から関連内容を迅速に検索

精度@1が71.57%を達成

法律条項マッチング

特定の法律手続きに関連する文書段落を識別

精度@3が85.71%を達成

企業知識管理

政策文書検索

企業内部文書から関連政策情報を検索

精度@5が88.71%を達成

🚀 BGE base Financial Matryoshka

このモデルは、BAAI/bge-base-en-v1.5 から微調整された sentence-transformers モデルです。文章や段落を768次元の密ベクトル空間にマッピングし、意味的な文章の類似度、意味検索、言い換えマイニング、テキスト分類、クラスタリングなどに使用できます。

🚀 クイックスタート

このモデルは、文章や段落を768次元の密ベクトル空間にマッピングすることができ、様々な自然言語処理タスクに利用できます。以下に、基本的な使い方を説明します。

✨ 主な機能

文章や段落を768次元の密ベクトル空間にマッピングします。
意味的な文章の類似度、意味検索、言い換えマイニング、テキスト分類、クラスタリングなどのタスクに使用できます。

📦 インストール

まず、Sentence Transformers ライブラリをインストールします。

pip install -U sentence-transformers

💻 使用例

基本的な使用法

from sentence_transformers import SentenceTransformer

# 🤗 Hub からダウンロード
model = SentenceTransformer("NickyNicky/bge-base-financial-matryoshka")
# 推論を実行
sentences = [
    'Non-GAAP earnings from operations and non-GAAP operating profit margin consist of earnings from operations or earnings from operations as a percentage of net revenue excluding the items mentioned above and charges relating to the amortization of intangible assets, goodwill impairment, transformation costs and acquisition, disposition and other related charges. Hewlett Packard Enterprise excludes these items because they are non-cash expenses, are significantly impacted by the timing and magnitude of acquisitions, and are inconsistent in amount and frequency.',
    "What specific charges are excluded from Hewlett Packard Enterprise's non-GAAP operating profit margin and why?",
    'How many shares were outstanding at the beginning of 2023 and what was their aggregate intrinsic value?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# 埋め込みベクトルの類似度スコアを取得
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 ドキュメント

モデルの詳細

モデルの説明

属性	详情
モデルタイプ	Sentence Transformer
ベースモデル	BAAI/bge-base-en-v1.5
最大シーケンス長	512トークン
出力次元数	768トークン
類似度関数	コサイン類似度
言語	en
ライセンス	apache-2.0

モデルのソース

ドキュメント: Sentence Transformers Documentation
リポジトリ: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

完全なモデルアーキテクチャ

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

評価

メトリクス

情報検索

データセット: dim_768 以下は、異なる次元のデータセットにおける情報検索タスクの評価結果です。

データセット名	評価指標	値
dim 768	Cosine Accuracy@1	0.7157142857142857
dim 768	Cosine Accuracy@3	0.8571428571428571
dim 768	Cosine Accuracy@5	0.8871428571428571
dim 768	Cosine Accuracy@10	0.9314285714285714
dim 768	Cosine Precision@1	0.7157142857142857
dim 768	Cosine Precision@3	0.2857142857142857
dim 768	Cosine Precision@5	0.1774285714285714
dim 768	Cosine Precision@10	0.09314285714285712
dim 768	Cosine Recall@1	0.7157142857142857
dim 768	Cosine Recall@3	0.8571428571428571
dim 768	Cosine Recall@5	0.8871428571428571
dim 768	Cosine Recall@10	0.9314285714285714
dim 768	Cosine Ndcg@10	0.8274896625809096
dim 768	Cosine Mrr@10	0.7939818594104311
dim 768	Cosine Map@100	0.7969204030602811
dim 512	Cosine Accuracy@1	0.7142857142857143
dim 512	Cosine Accuracy@3	0.8571428571428571
dim 512	Cosine Accuracy@5	0.8871428571428571
dim 512	Cosine Accuracy@10	0.9314285714285714
dim 512	Cosine Precision@1	0.7142857142857143
dim 512	Cosine Precision@3	0.2857142857142857
dim 512	Cosine Precision@5	0.1774285714285714
dim 512	Cosine Precision@10	0.09314285714285712
dim 512	Cosine Recall@1	0.7142857142857143
dim 512	Cosine Recall@3	0.8571428571428571
dim 512	Cosine Recall@5	0.8871428571428571
dim 512	Cosine Recall@10	0.9314285714285714
dim 512	Cosine Ndcg@10	0.8267670378473014
dim 512	Cosine Mrr@10	0.7930204081632654
dim 512	Cosine Map@100	0.7958033409607879
dim 256	Cosine Accuracy@1	0.7157142857142857
dim 256	Cosine Accuracy@3	0.8514285714285714
dim 256	Cosine Accuracy@5	0.8828571428571429
dim 256	Cosine Accuracy@10	0.93
dim 256	Cosine Precision@1	0.7157142857142857
dim 256	Cosine Precision@3	0.2838095238095238
dim 256	Cosine Precision@5	0.17657142857142857
dim 256	Cosine Precision@10	0.09299999999999999
dim 256	Cosine Recall@1	0.7157142857142857
dim 256	Cosine Recall@3	0.8514285714285714
dim 256	Cosine Recall@5	0.8828571428571429
dim 256	Cosine Recall@10	0.93
dim 256	Cosine Ndcg@10	0.825504930245723
dim 256	Cosine Mrr@10	0.7918724489795919
dim 256	Cosine Map@100	0.7945830508495424
dim 128	Cosine Accuracy@1	0.7142857142857143
dim 128	Cosine Accuracy@3	0.8428571428571429
dim 128	Cosine Accuracy@5	0.8742857142857143
dim 128	Cosine Accuracy@10	0.9214285714285714
dim 128	Cosine Precision@1	0.7142857142857143
dim 128	Cosine Precision@3	0.28095238095238095
dim 128	Cosine Precision@5	0.17485714285714282
dim 128	Cosine Precision@10	0.09214285714285712
dim 128	Cosine Recall@1	0.7142857142857143
dim 128	Cosine Recall@3	0.8428571428571429
dim 128	Cosine Recall@5	0.8742857142857143
dim 128	Cosine Recall@10	0.9214285714285714
dim 128	Cosine Ndcg@10	0.8203162516614704
dim 128	Cosine Mrr@10	0.7878543083900227
dim 128	Cosine Map@100	0.7909435994513387
dim 64	Cosine Accuracy@1	0.6828571428571428
dim 64	Cosine Accuracy@3	0.81
dim 64	Cosine Accuracy@5	0.85
dim 64	Cosine Accuracy@10	0.9042857142857142
dim 64	Cosine Precision@1	0.6828571428571428
dim 64	Cosine Precision@3	0.27
dim 64	Cosine Precision@5	0.16999999999999998
dim 64	Cosine Precision@10	0.09042857142857143
dim 64	Cosine Recall@1	0.6828571428571428
dim 64	Cosine Recall@3	0.81
dim 64	Cosine Recall@5	0.85
dim 64	Cosine Recall@10	0.9042857142857142
dim 64	Cosine Ndcg@10	0.7926026006937184
dim 64	Cosine Mrr@10	0.7570844671201811
dim 64	Cosine Map@100	0.7606949750229449