mfaq开源多语言FAQ检索模型 - 免费部署，能对候选答案按问题排序

首页

Mfaq

由 clips 开发

基于MFAQ数据集训练的多语言FAQ检索模型，能根据给定问题对候选答案进行排序。

文本嵌入支持多种语言开源协议:Apache-2.0 #多语言FAQ检索 #问答排序 #句子相似度

下载量 208

发布时间 : 3/2/2022

模型简介

该模型是一个多语言句子转换器，专门用于FAQ检索任务。它能够计算问题和答案之间的相似度，从而实现对候选答案的排序。

模型特点

多语言支持

支持21种语言的FAQ检索任务

问答标记

使用<Q>和<A>标记区分问题和答案，提高检索准确性

高效检索

能够快速计算问题与候选答案之间的相似度

模型能力

句子相似度计算

FAQ检索

多语言文本处理

特征提取

使用案例

客户服务

自动FAQ回答系统

用于构建自动回答客户常见问题的系统

提高客户服务效率，减少人工客服工作量

知识管理

企业内部知识库检索

帮助员工快速找到公司内部知识库中的相关信息

提高信息检索效率，促进知识共享

🚀 MFAQ

我们推出了一个多语言的常见问题解答（FAQ）检索模型，该模型基于 MFAQ 数据集进行训练，它可以根据给定的问题对候选答案进行排序。

🚀 快速开始

MFAQ 是一个多语言的 FAQ 检索模型，它能依据给定问题对候选答案进行排序，为用户提供准确的问答匹配。

✨ 主要特性

多语言支持：支持捷克语（cs）、丹麦语（da）、德语（de）、英语（en）等多种语言。
适用于多种场景：可用于句子相似度计算、特征提取等任务。

📦 安装指南

pip install sentence-transformers transformers

💻 使用示例

基础用法

from sentence_transformers import SentenceTransformer

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

model = SentenceTransformer('clips/mfaq')
embeddings = model.encode([question, answer_1, answer_3, answer_3])
print(embeddings)

高级用法

from transformers import AutoTokenizer, AutoModel
import torch

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
model = AutoModel.from_pretrained('clips/mfaq')

# Tokenize sentences
encoded_input = tokenizer([question, answer_1, answer_3, answer_3], padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, max pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

📚 详细文档

训练：你可以在此处找到该模型的训练脚本。
开发人员：此模型由 Maxime De Bruyn、Ehsan Lotfi、Jeska Buhmann 和 Walter Daelemans 开发。

📄 许可证

本模型采用 Apache-2.0 许可证。

📚 引用信息

@misc{debruyn2021mfaq,
      title={MFAQ: a Multilingual FAQ Dataset}, 
      author={Maxime De Bruyn and Ehsan Lotfi and Jeska Buhmann and Walter Daelemans},
      year={2021},
      eprint={2109.12870},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}