bge - m3 - financial - matryoshkaオープンソース金融文埋め込みモデル

ホーム

Bge M3 Financial Matryoshka

haophancsによって開発

BAAI/bge-m3をファインチューニングした金融分野向け文埋め込みモデルで、1024次元ベクトル表現をサポートし、セマンティック類似度や情報検索タスクに適しています

テキスト埋め込み

Safetensors

英語オープンソースライセンス:Apache-2.0 #金融セマンティック検索 #多次元埋め込み #長文エンコーディング

ダウンロード数 34

リリース時間 : 6/22/2024

モデル概要

これはsentence-transformersフレームワークを基にした金融分野専用モデルで、テキストを高次元ベクトル空間にマッピングできます。主にセマンティックテキスト類似度計算や情報検索などのタスクに使用されます。金融テキストに最適化されており、財務報告書やビジネス文書などの専門的な内容の処理に特に適しています。

モデル特徴

金融分野最適化

金融テキストに特化してファインチューニングされており、財務報告書やビジネス文書などの専門的な内容の処理で優れた性能を発揮します

高次元ベクトル表現

1024次元の密ベクトル表現をサポートし、より豊富なセマンティック情報を捉えることができます

多次元評価

モデルは複数の次元(1024/768/512/384)で評価されており、様々なシナリオでの性能が保証されています

長文サポート

最大8192トークンのシーケンス長をサポートしており、長い金融文書の処理に適しています

モデル能力

セマンティックテキスト類似度計算

セマンティック検索

パラフレーズマイニング

テキスト分類

テキストクラスタリング

金融文書分析

使用事例

金融文書処理

財務報告書検索

大量の財務報告書からクエリに最も関連する段落を迅速に見つけます

テストセットで71.7%のaccuracy@1を達成

規制文書分析

規制文書内の特定条項に関連する内容を分析します

テストセットで83.1%のaccuracy@3を達成

ビジネスインテリジェンス

ビジネスレポート類似度分析

異なるビジネスレポート間の類似内容を比較します

🚀 BGE-M3 Financial Matryoshka

このモデルは、BAAI/bge-m3 をファインチューニングした sentence-transformers モデルです。文章や段落を 1024 次元の密ベクトル空間にマッピングし、意味的な文章の類似性、意味検索、言い換えマイニング、テキスト分類、クラスタリングなどに使用できます。

🚀 クイックスタート

このモデルを使用するには、まず Sentence Transformers ライブラリをインストールする必要があります。その後、モデルをロードして推論を実行できます。

✨ 主な機能

文章や段落を 1024 次元の密ベクトル空間にマッピングします。
意味的な文章の類似性、意味検索、言い換えマイニング、テキスト分類、クラスタリングなどのタスクに使用できます。

📦 インストール

まず、Sentence Transformers ライブラリをインストールします。

pip install -U sentence-transformers

💻 使用例

基本的な使用法

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("haophancs/bge-m3-financial-matryoshka")
# Run inference
sentences = [
    'As of January 28, 2024 the net carrying value of our inventories was $1.3 billion, which included provisions for obsolete and damaged inventory of $139.7 million.',
    "What is the status of the company's inventory as of January 28, 2024, in terms of its valuation and provisions for obsolescence?",
    'What is the relationship between the ESG goals and the long-term growth strategy?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 ドキュメント

モデルの詳細

モデルの説明

属性	详情
モデルタイプ	Sentence Transformer
ベースモデル	BAAI/bge-m3
最大シーケンス長	8192 トークン
出力次元数	1024 トークン
類似度関数	コサイン類似度
言語	en
ライセンス	apache-2.0

モデルのソース

ドキュメント: Sentence Transformers Documentation
リポジトリ: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

完全なモデルアーキテクチャ

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

評価

情報検索

以下は、異なるデータセットでの評価結果です。

データセット: `dim_1024`

InformationRetrievalEvaluator を使用して評価しました。

メトリック	値
cosine_accuracy@1	0.7171
cosine_accuracy@3	0.8314
cosine_accuracy@5	0.87
cosine_accuracy@10	0.9143
cosine_precision@1	0.7171
cosine_precision@3	0.2771
cosine_precision@5	0.174
cosine_precision@10	0.0914
cosine_recall@1	0.7171
cosine_recall@3	0.8314
cosine_recall@5	0.87
cosine_recall@10	0.9143
cosine_ndcg@10	0.8152
cosine_mrr@10	0.7836
cosine_map@100	0.7867

データセット: `dim_768`

InformationRetrievalEvaluator を使用して評価しました。

メトリック	値
cosine_accuracy@1	0.7129
cosine_accuracy@3	0.8343
cosine_accuracy@5	0.8657
cosine_accuracy@10	0.91
cosine_precision@1	0.7129
cosine_precision@3	0.2781
cosine_precision@5	0.1731
cosine_precision@10	0.091
cosine_recall@1	0.7129
cosine_recall@3	0.8343
cosine_recall@5	0.8657
cosine_recall@10	0.91
cosine_ndcg@10	0.8122
cosine_mrr@10	0.7809
cosine_map@100	0.7843

データセット: `dim_512`

InformationRetrievalEvaluator を使用して評価しました。

メトリック	値
cosine_accuracy@1	0.7114
cosine_accuracy@3	0.8357
cosine_accuracy@5	0.8643
cosine_accuracy@10	0.91
cosine_precision@1	0.7114
cosine_precision@3	0.2786
cosine_precision@5	0.1729
cosine_precision@10	0.091
cosine_recall@1	0.7114
cosine_recall@3	0.8357
cosine_recall@5	0.8643
cosine_recall@10	0.91
cosine_ndcg@10	0.811
cosine_mrr@10	0.7793
cosine_map@100	0.7827

データセット: `dim_384`

InformationRetrievalEvaluator を使用して評価しました。

メトリック	値
cosine_accuracy@1	0.7143
cosine_accuracy@3	0.8329
cosine_accuracy@5	0.8629
cosine_accuracy@10	0.9129
cosine_precision@1	0.7143
cosine_precision@3	0.2776
cosine_precision@5	0.1726
cosine_precision@10	0.0913
cosine_recall@1	0.7143
cosine_recall@3	0.8329
cosine_recall@5	0.8629
cosine_recall@10	0.9129
cosine_ndcg@10	0.8126
cosine_mrr@10	0.7806
cosine_map@100	0.7838