🚀 MFAQ
我们推出了一个多语言的常见问题解答(FAQ)检索模型,该模型基于 MFAQ 数据集 进行训练,它可以根据给定的问题对候选答案进行排序。
🚀 快速开始
MFAQ 是一个多语言的 FAQ 检索模型,它能依据给定问题对候选答案进行排序,为用户提供准确的问答匹配。
✨ 主要特性
- 多语言支持:支持捷克语(cs)、丹麦语(da)、德语(de)、英语(en)等多种语言。
- 适用于多种场景:可用于句子相似度计算、特征提取等任务。
📦 安装指南
pip install sentence-transformers transformers
💻 使用示例
基础用法
from sentence_transformers import SentenceTransformer
question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."
model = SentenceTransformer('clips/mfaq')
embeddings = model.encode([question, answer_1, answer_3, answer_3])
print(embeddings)
高级用法
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."
tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
model = AutoModel.from_pretrained('clips/mfaq')
encoded_input = tokenizer([question, answer_1, answer_3, answer_3], padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
📚 详细文档
- 训练:你可以在 此处 找到该模型的训练脚本。
- 开发人员:此模型由 Maxime De Bruyn、Ehsan Lotfi、Jeska Buhmann 和 Walter Daelemans 开发。
📄 许可证
本模型采用 Apache-2.0 许可证。
📚 引用信息
@misc{debruyn2021mfaq,
title={MFAQ: a Multilingual FAQ Dataset},
author={Maxime De Bruyn and Ehsan Lotfi and Jeska Buhmann and Walter Daelemans},
year={2021},
eprint={2109.12870},
archivePrefix={arXiv},
primaryClass={cs.CL}
}