🚀 MFAQ
我們推出了一個多語言的常見問題解答(FAQ)檢索模型,該模型基於 MFAQ 數據集 進行訓練,它可以根據給定的問題對候選答案進行排序。
🚀 快速開始
MFAQ 是一個多語言的 FAQ 檢索模型,它能依據給定問題對候選答案進行排序,為用戶提供準確的問答匹配。
✨ 主要特性
- 多語言支持:支持捷克語(cs)、丹麥語(da)、德語(de)、英語(en)等多種語言。
- 適用於多種場景:可用於句子相似度計算、特徵提取等任務。
📦 安裝指南
pip install sentence-transformers transformers
💻 使用示例
基礎用法
from sentence_transformers import SentenceTransformer
question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."
model = SentenceTransformer('clips/mfaq')
embeddings = model.encode([question, answer_1, answer_3, answer_3])
print(embeddings)
高級用法
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."
tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
model = AutoModel.from_pretrained('clips/mfaq')
encoded_input = tokenizer([question, answer_1, answer_3, answer_3], padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
📚 詳細文檔
- 訓練:你可以在 此處 找到該模型的訓練腳本。
- 開發人員:此模型由 Maxime De Bruyn、Ehsan Lotfi、Jeska Buhmann 和 Walter Daelemans 開發。
📄 許可證
本模型採用 Apache-2.0 許可證。
📚 引用信息
@misc{debruyn2021mfaq,
title={MFAQ: a Multilingual FAQ Dataset},
author={Maxime De Bruyn and Ehsan Lotfi and Jeska Buhmann and Walter Daelemans},
year={2021},
eprint={2109.12870},
archivePrefix={arXiv},
primaryClass={cs.CL}
}