mfaqオープンソース多言語FAQ検索モデル - 無料でデプロイ可能、候補回答を質問に基づいて並べ替え可能

ホーム

Mfaq

clipsによって開発

MFAQデータセットを使って訓練された多言語FAQ検索モデルで、与えられた質問に基づいて候補回答を並べ替えることができます。

テキスト埋め込み複数言語対応オープンソースライセンス:Apache-2.0 #多言語FAQ検索 #質問と回答の並べ替え #文章の類似度

ダウンロード数 208

リリース時間 : 3/2/2022

モデル概要

このモデルは多言語文章変換器で、FAQ検索タスクに特化しています。質問と回答の間の類似度を計算し、候補回答の並べ替えを実現します。

モデル特徴

多言語対応

21種類の言語のFAQ検索タスクをサポートします

質問と回答のマーキング

<Q>と<A>のマーカーを使って質問と回答を区別し、検索の精度を向上させます

効率的な検索

質問と候補回答の間の類似度を迅速に計算することができます

モデル能力

文章の類似度計算

FAQ検索

多言語テキスト処理

特徴抽出

使用事例

カスタマーサービス

自動FAQ応答システム

顧客の一般的な質問に自動で回答するシステムを構築するために使用します

カスタマーサービスの効率を向上させ、人工客服の作業量を減らします

知識管理

企業内部の知識ベース検索

社員が社内の知識ベースから関連情報を迅速に見つけるのを支援します

情報検索の効率を向上させ、知識共有を促進します

🚀 MFAQ

私たちは、MFAQデータセットで学習させた多言語FAQ検索モデルを提供しています。このモデルは、与えられた質問に対して候補となる回答をランク付けします。

🚀 クイックスタート

このモデルは、与えられた質問に対して候補となる回答をランク付けする多言語FAQ検索モデルです。以下にインストール方法と使用例を示します。

✨ 主な機能

多言語対応: cs, da, de, en, es, fi, fr, he, hr, hu, id, it, nl, no, pl, pt, ro, ru, sv, tr, vi の言語に対応。
質問に対する回答のランク付け: 与えられた質問に対して候補となる回答をランク付けします。

📦 インストール

pip install sentence-transformers transformers

💻 使用例

基本的な使用法

MFAQは、sentence-transformersまたはHuggingFaceモデルを直接使用して利用できます。どちらの場合も、質問の前に <Q> を、回答の前に <A> を付ける必要があります。

Sentence Transformers

from sentence_transformers import SentenceTransformer

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

model = SentenceTransformer('clips/mfaq')
embeddings = model.encode([question, answer_1, answer_3, answer_3])
print(embeddings)

HuggingFace Transformers

from transformers import AutoTokenizer, AutoModel
import torch

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
model = AutoModel.from_pretrained('clips/mfaq')

# Tokenize sentences
encoded_input = tokenizer([question, answer_1, answer_3, answer_3], padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, max pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

📚 ドキュメント

学習

モデルの学習スクリプトはこちらで見つけることができます。

開発者

このモデルは、Maxime De Bruyn、Ehsan Lotfi、Jeska Buhmann、Walter Daelemansによって開発されました。

引用情報

@misc{debruyn2021mfaq,
      title={MFAQ: a Multilingual FAQ Dataset}, 
      author={Maxime De Bruyn and Ehsan Lotfi and Jeska Buhmann and Walter Daelemans},
      year={2021},
      eprint={2109.12870},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 ライセンス

このプロジェクトは、Apache-2.0ライセンスの下で公開されています。

属性	详情
パイプラインタグ	文の類似度
モデルタイプ	sentence-transformers、feature-extraction、sentence-similarity、transformers
学習データ	clips/mfaq
言語	cs, da, de, en, es, fi, fr, he, hr, hu, id, it, nl, no, pl, pt, ro, ru, sv, tr, vi