mfaq Open-source Multilingual FAQ Retrieval Model - Free Deployment, Can Sort Candidate Answers by Questions

Mfaq

Developed by clips

A multilingual FAQ retrieval model trained on the MFAQ dataset, capable of ranking candidate answers based on given questions.

Text Embedding Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual FAQ Retrieval #Question Answering Ranking #Sentence Similarity

Downloads 208

Release Time : 3/2/2022

Model Overview

This model is a multilingual sentence transformer specifically designed for FAQ retrieval tasks. It can calculate the similarity between questions and answers to rank candidate answers.

Model Features

Multilingual Support

Supports FAQ retrieval tasks in 21 languages.

Question-Answer Tagging

Uses <Q> and <A> tags to distinguish questions and answers, improving retrieval accuracy.

Efficient Retrieval

Capable of quickly calculating the similarity between questions and candidate answers.

Model Capabilities

Sentence Similarity Calculation

FAQ Retrieval

Multilingual Text Processing

Feature Extraction

Use Cases

Customer Service

Automatic FAQ Response System

Used to build systems that automatically answer common customer questions.

Improves customer service efficiency and reduces manual support workload.

Knowledge Management

Internal Corporate Knowledge Base Retrieval

Helps employees quickly find relevant information in the company's internal knowledge base.

Enhances information retrieval efficiency and promotes knowledge sharing.

🚀 MFAQ

MFAQ is a multilingual FAQ retrieval model. Trained on the MFAQ dataset, it ranks candidate answers according to a given question, offering a practical solution for multilingual FAQ retrieval scenarios.

🚀 Quick Start

MFAQ is a multilingual FAQ retrieval model. It ranks candidate answers based on a given question, trained on the MFAQ dataset.

✨ Features

Multilingual Support: Supports multiple languages including Czech, Danish, German, English, etc.
Based on Popular Frameworks: Can be used with sentence-transformers and HuggingFace Transformers.

📦 Installation

pip install sentence-transformers transformers

💻 Usage Examples

Basic Usage

You can use MFAQ with sentence-transformers or directly with a HuggingFace model. In both cases, questions need to be prepended with <Q>, and answers with <A>.

Sentence Transformers

from sentence_transformers import SentenceTransformer

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

model = SentenceTransformer('clips/mfaq')
embeddings = model.encode([question, answer_1, answer_3, answer_3])
print(embeddings)

HuggingFace Transformers

from transformers import AutoTokenizer, AutoModel
import torch

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

question = "<Q>How many models can I host on HuggingFace?"
answer_1 = "<A>All plans come with unlimited private models and datasets."
answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."

tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
model = AutoModel.from_pretrained('clips/mfaq')

# Tokenize sentences
encoded_input = tokenizer([question, answer_1, answer_3, answer_3], padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, max pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

📚 Documentation

Training

You can find the training script for the model here.

People

This model was developed by Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann and Walter Daelemans.

Citation information

@misc{debruyn2021mfaq,
      title={MFAQ: a Multilingual FAQ Dataset}, 
      author={Maxime De Bruyn and Ehsan Lotfi and Jeska Buhmann and Walter Daelemans},
      year={2021},
      eprint={2109.12870},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご