Distilcamembert-base-qa Open-Source French Question-Answering Model - Fast Inference and Saves Computational Resources

Distilcamembert Base Qa

Developed by cmarkea

A French Q&A task model fine-tuned based on DistilCamemBERT, optimized for inference speed and reduced computational resource consumption

Question Answering System

Transformers

French#French Q&A #Lightweight Reasoning #Context Understanding

Downloads 194

Release Time : 3/2/2022

Model Overview

DistilCamemBERT-QA is a French Q&A model fine-tuned from DistilCamemBERT, specifically designed for French Q&A tasks. It achieves a 50% reduction in inference time through knowledge distillation while maintaining performance close to the original model.

Model Features

Efficient Inference

50% reduction in inference time compared to the original CamemBERT model with the same computational resource consumption

French Optimization

Specially optimized for French Q&A tasks, performing well on French datasets

Knowledge Distillation

Employs knowledge distillation techniques to extract key knowledge from larger models

Model Capabilities

French Text Understanding

Contextual Q&A

Answer Extraction

Use Cases

Customer Support

French Customer Support

Used for automated Q&A functionality in French customer support systems

Can quickly and accurately answer common customer questions about products or services

Education

French Learning Assistance

Helps French learners understand text content and answer questions

Provides accurate text explanations and answers

🚀 DistilCamemBERT-QA

We present DistilCamemBERT-QA, a model fine-tuned from DistilCamemBERT for the French language question-answering task. This model is trained on two datasets, FQuAD v1.0 and Piaf, which consist of contexts and questions with answers within the contexts.

🚀 Quick Start

Prerequisites

Python environment
Install necessary libraries: transformers, optimum (if using ONNX)

Basic Usage

from transformers import pipeline

qa_engine = pipeline(
    "question-answering",
    model="cmarkea/distilcamembert-base-qa",
    tokenizer="cmarkea/distilcamembert-base-qa"
)

result = qa_engine(
    context="David Fincher, né le 28 août 1962 à Denver (Colorado), "
    "est un réalisateur et producteur américain. Il est principalement "
    "connu pour avoir réalisé les films Seven, Fight Club, L'Étrange "
    "Histoire de Benjamin Button, The Social Network et Gone Girl qui "
    "lui ont valu diverses récompenses et nominations aux Oscars du "
    "cinéma ou aux Golden Globes. Réputé pour son perfectionnisme, il "
    "peut tourner un très grand nombre de prises de ses plans et "
    "séquences afin d'obtenir le rendu visuel qu'il désire. Il a "
    "également développé et produit les séries télévisées House of "
    "Cards (pour laquelle il remporte l'Emmy Award de la meilleure "
    "réalisation pour une série dramatique en 2013) et Mindhunter, "
    "diffusées sur Netflix.",
    question="Quel est le métier de David Fincher ?"
)

result
{'score': 0.7981914281845093,
 'start': 61,
 'end': 98,
 'answer': ' réalisateur et producteur américain.'}

Advanced Usage (Optimum + ONNX)

from optimum.onnxruntime import ORTModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline

HUB_MODEL = "cmarkea/distilcamembert-base-qa"

tokenizer = AutoTokenizer.from_pretrained(HUB_MODEL)
model = ORTModelForQuestionAnswering.from_pretrained(HUB_MODEL)
onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)

# Quantized onnx model
quantized_model = ORTModelForQuestionAnswering.from_pretrained(
    HUB_MODEL, file_name="model_quantized.onnx"
)

✨ Features

Efficient Inference: Thanks to DistilCamemBERT, it divides the inference time by 2 with the same consumption power compared to models based on CamemBERT.
Trained on Quality Datasets: Utilizes FQuAD v1.0 and Piaf datasets for training, ensuring high - quality question - answering performance.

📦 Installation

No specific installation steps are provided in the original document.

📚 Documentation

Dataset

The dataset includes FQuAD v1.0 and Piaf, with 24,566 question - answer pairs in the training set and 3,188 in the evaluation set.

Evaluation results and benchmark

We compare DistilCamemBERT-QA with two other French language models: etalab-ia/camembert-base-squadFR-fquad-piaf based on CamemBERT and fmikaelian/flaubert-base-uncased-squad based on FlauBERT.

For benchmarks, we use word - to - word comparison, f1 - score (measures the intersection quality between predicted responses and ground truth), and inclusion score (measures if the ground truth answer is included in the predicted answer). The mean inference time is measured on an AMD Ryzen 5 4500U @ 2.3GHz with 6 cores.

model	time (ms)	exact match (%)	f1-score (%)	inclusion-score (%)
cmarkea/distilcamembert-base-qa	216.96	25.66	62.65	59.82
etalab-ia/camembert-base-squadFR-fquad-piaf	432.17	59.76	79.57	69.23
fmikaelian/flaubert-base-uncased-squad	875.84	0.22	5.21	3.68

⚠️ Important Note

Do not take into account the results of the FlauBERT model. The modeling seems to be a problem, as the results seem very low.

📄 License

The model is licensed under cc - by - nc - sa - 3.0.

🔧 Technical Details

This model is fine - tuned from DistilCamemBERT for the French question - answering task. It addresses the scaling issue of models based on CamemBERT by reducing the inference time while maintaining the same power consumption.

📖 Citation

@inproceedings{delestre:hal-03674695,
  TITLE = {{DistilCamemBERT : une distillation du mod{\`e}le fran{\c c}ais CamemBERT}},
  AUTHOR = {Delestre, Cyrile and Amar, Abibatou},
  URL = {https://hal.archives-ouvertes.fr/hal-03674695},
  BOOKTITLE = {{CAp (Conf{\'e}rence sur l'Apprentissage automatique)}},
  ADDRESS = {Vannes, France},
  YEAR = {2022},
  MONTH = Jul,
  KEYWORDS = {NLP ; Transformers ; CamemBERT ; Distillation},
  PDF = {https://hal.archives-ouvertes.fr/hal-03674695/file/cap2022.pdf},
  HAL_ID = {hal-03674695},
  HAL_VERSION = {v1},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご