sentence_similarity_semantic_search開源模型 - 用於新聞語義搜索和句子相似度計算

首頁

Sentence Similarity Semantic Search

由Sakil開發

該模型是基於新聞數據集微調的句子轉換器，專門用於語義搜索和句子相似度計算。

文本嵌入

PyTorch

英語開源協議:Apache-2.0 #新聞語義匹配 #餘弦相似度計算 #標題-內容對齊

下載量 801

發布時間 : 2/22/2023

模型概述

該模型適用於語義搜索、句子相似度計算、推薦系統等場景，可直接用於推理任務或進行二次微調。

模型特點

基於新聞數據集微調

模型使用Kaggle的新聞數據集進行微調，優化了語義搜索和句子相似度計算能力。

多功能應用

適用於語義搜索、句子相似度計算、推薦系統等多種場景。

易於使用

提供簡單的API接口，可直接用於推理任務或進行二次微調。

模型能力

語義搜索

句子相似度計算

推薦系統

使用案例

信息檢索

新聞標題與內容匹配

計算新聞標題與正文內容的相似度，用於內容匹配和推薦。

高相似度表示標題與內容高度相關

推薦系統

相關內容推薦

基於句子相似度推薦相關內容，提升用戶體驗。

提高用戶點擊率和停留時間

🚀 句子相似度語義搜索模型

本項目基於sentence-transformers庫，對模型進行微調以用於語義搜索和句子相似度任務。通過收集新聞數據集進行訓練，該模型可應用於語義搜索、句子相似度計算以及推薦系統等場景。

🚀 快速開始

你可以使用以下命令安裝所需的庫：

pip install -U sentence-transformers

以下是一個使用模型進行句子相似度計算的示例代碼：

from sentence_transformers import SentenceTransformer, InputExample, losses
import pandas as pd
from sentence_transformers import SentenceTransformer, InputExample
from torch.utils.data import DataLoader
from sentence_transformers import SentenceTransformer, util

model_name="Sakil/sentence_similarity_semantic_search"
model = SentenceTransformer(model_name)
sentences = ['A man is eating food.',
          'A man is eating a piece of bread.',
          'The girl is carrying a baby.',
          'A man is riding a horse.',
          'A woman is playing violin.',
          'Two men pushed carts through the woods.',
          'A man is riding a white horse on an enclosed ground.',
          'A monkey is playing drums.',
          'Someone in a gorilla costume is playing a set of drums.'
          ]

#Encode all sentences
embeddings = model.encode(sentences)

#Compute cosine similarity between all pairs
cos_sim = util.cos_sim(embeddings, embeddings)

#Add all pairs to a list with their cosine similarity score
all_sentence_combinations = []

for i in range(len(cos_sim)-1):

    for j in range(i+1, len(cos_sim)):
    
        all_sentence_combinations.append([cos_sim[i][j], i, j])

#Sort list by the highest cosine similarity score

all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True)

print("Top-5 most similar pairs:")

for score, i, j in all_sentence_combinations[0:5]:

    print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j]))