mfaq開源多語言FAQ檢索模型 - 免費部署，能對候選答案按問題排序

首頁

Mfaq

由clips開發

基於MFAQ數據集訓練的多語言FAQ檢索模型，能根據給定問題對候選答案進行排序。

文本嵌入支持多種語言開源協議:Apache-2.0 #多語言FAQ檢索 #問答排序 #句子相似度

下載量 208

發布時間 : 3/2/2022

模型概述

該模型是一個多語言句子轉換器，專門用於FAQ檢索任務。它能夠計算問題和答案之間的相似度，從而實現對候選答案的排序。

模型特點

多語言支持

支持21種語言的FAQ檢索任務

問答標記

使用<Q>和<A>標記區分問題和答案，提高檢索準確性

高效檢索

能夠快速計算問題與候選答案之間的相似度

模型能力

句子相似度計算

FAQ檢索

多語言文本處理

特徵提取

使用案例

客戶服務

自動FAQ回答系統

用於構建自動回答客戶常見問題的系統

提高客戶服務效率，減少人工客服工作量

知識管理

企業內部知識庫檢索

幫助員工快速找到公司內部知識庫中的相關信息

提高信息檢索效率，促進知識共享

🚀 MFAQ

我們推出了一個多語言的常見問題解答（FAQ）檢索模型，該模型基於 MFAQ 數據集進行訓練，它可以根據給定的問題對候選答案進行排序。

🚀 快速開始

MFAQ 是一個多語言的 FAQ 檢索模型，它能依據給定問題對候選答案進行排序，為用戶提供準確的問答匹配。

✨ 主要特性

多語言支持：支持捷克語（cs）、丹麥語（da）、德語（de）、英語（en）等多種語言。
適用於多種場景：可用於句子相似度計算、特徵提取等任務。

📦 安裝指南

pip install sentence-transformers transformers

💻 使用示例

基礎用法

from sentence_transformers import SentenceTransformer

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

model = SentenceTransformer('clips/mfaq')
embeddings = model.encode([question, answer_1, answer_3, answer_3])
print(embeddings)

高級用法

from transformers import AutoTokenizer, AutoModel
import torch

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
model = AutoModel.from_pretrained('clips/mfaq')

# Tokenize sentences
encoded_input = tokenizer([question, answer_1, answer_3, answer_3], padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, max pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

📚 詳細文檔

訓練：你可以在此處找到該模型的訓練腳本。
開發人員：此模型由 Maxime De Bruyn、Ehsan Lotfi、Jeska Buhmann 和 Walter Daelemans 開發。

📄 許可證

本模型採用 Apache-2.0 許可證。

📚 引用信息

@misc{debruyn2021mfaq,
      title={MFAQ: a Multilingual FAQ Dataset}, 
      author={Maxime De Bruyn and Ehsan Lotfi and Jeska Buhmann and Walter Daelemans},
      year={2021},
      eprint={2109.12870},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}