langcache-embed-v2オープンソース文変換器モデル - 768次元の文埋め込みベクトルを無料で生成

ホーム

Langcache Embed V2

redisによって開発

Redis Langcache Embed v1をベースに微調整された文変換器モデルで、768次元の文埋め込みベクトルを生成するために使用されます。

テキスト埋め込み #意味類似度計算 #長文埋め込み #トリプル微調整

ダウンロード数 126

リリース時間 : 5/21/2025

モデル概要

このモデルはsentence - transformersフレームワークに基づいており、トリプルデータセットで微調整されており、テキストを768次元のベクトル空間にマッピングでき、意味類似度計算、検索、分類などのタスクをサポートします。

モデル特徴

高次元ベクトルマッピング

文や段落を768次元の密ベクトル空間にマッピングできます。

長文対応

最大8192トークンのシーケンス長をサポートします。

マルチタスク適合

類似度計算、意味検索、テキスト分類などの様々なNLPタスクに適しています。

効率的なトレーニング

MatryoshkaLossとトリプルデータを使用して最適化トレーニングを行います。

モデル能力

意味テキスト類似度計算

意味検索

言い換えマイニング

テキスト分類

テキストクラスタリング

使用事例

情報検索

意味検索システム

キーワードではなく意味に基づく検索システムを構築します。

意味が類似したクエリとドキュメントを識別できます。

コンテンツ分析

テキスト類似度分析

異なるテキスト間の意味類似度を比較します。

意味が近いテキストペアを識別できます。

テキストクラスタリング

意味が類似したドキュメントを自動的にグループ化します。

教師なしのドキュメント組織を実現します。

🚀 redis/langcache-embed-v1に基づくSentenceTransformer

このモデルは、tripletデータセットでredis/langcache-embed-v1から微調整されたsentence-transformersモデルです。文章や段落を768次元の密ベクトル空間にマッピングし、意味的な文章の類似性、意味検索、言い換えマイニング、テキスト分類、クラスタリングなどに利用できます。

✨ 主な機能

文章や段落を768次元の密ベクトル空間にマッピングすることができます。
意味的な文章の類似性、意味検索、言い換えマイニング、テキスト分類、クラスタリングなどのタスクに利用できます。

📦 インストール

まず、Sentence Transformersライブラリをインストールします。

pip install -U sentence-transformers

💻 使用例

基本的な使用法

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("redis/langcache-embed-v2")
# Run inference
sentences = [
    'What are some examples of crimes understood as a moral turpitude?',
    'What are some examples of crimes of moral turpitude?',
    'What are some examples of crimes understood as a legal aptitude?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 ドキュメント

モデルの詳細

モデルの説明

属性	詳情
モデルタイプ	Sentence Transformer
ベースモデル	redis/langcache-embed-v1
最大シーケンス長	8192トークン
出力次元数	768次元
類似度関数	コサイン類似度
学習データセット	triplet

モデルのソース

ドキュメント: Sentence Transformers Documentation
リポジトリ: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

完全なモデルアーキテクチャ

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

学習の詳細

データセット: triplet
サイズ: 36,864個の学習サンプル
列: anchor、positive、negative_1、negative_2、negative_3
サンプル: | anchor | positive | negative_1 | negative_2 | negative_3 | | ---- | ---- | ---- | ---- | ---- | | Is life really what I make of it? | Life is what you make it? | Is life hardly what I take of it? | Life is not entirely what I make of it. | Is life not what I make of it? | | When you visit a website, can a person running the website see your IP address? | Does every website I visit knows my public ip address? | When you avoid a website, can a person hiding the website see your MAC address? | When you send an email, can the recipient see your physical location? | When you visit a website, a person running the website cannot see your IP address. | | What are some cool features about iOS 10? | What are the best new features of iOS 10? | iOS 10 received criticism for its initial bugs and performance issues, and some users found the redesigned apps less intuitive compared to previous versions. | What are the drawbacks of using Android 14? | iOS 10 was widely criticized for its bugs, removal of beloved features, and generally being a downgrade from previous versions. |
損失関数: MatryoshkaLoss パラメータ:

{
    "loss": "CachedMultipleNegativesRankingLoss",
    "matryoshka_dims": [768,512,256,128,64],
    "matryoshka_weights": [1,1,1,1,1],
    "n_dims_per_step": -1
}

評価

medical redis quora negation

引用

Redis Langcache-embedモデル

当社のモデルを使用するか、当社の研究成果を基に構築する場合は、以下のように引用することをお勧めします。

@inproceedings{langcache-embed-v1,
    title = "Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data",
    author = "Gill, Cechmanek, Hutcherson, Rajamohan, Agarwal, Gulzar, Singh, Dion",
    month = "04",
    year = "2025",
    url = "https://arxiv.org/abs/2504.02268",
}

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}