DistilBERT開源問答模型 - 免費部署，參數少速度快且性能出眾！

首頁

Distilbert Base Uncased Distilled Squad

由distilbert開發

DistilBERT是BERT的輕量級蒸餾版本，參數量減少40%，速度提升60%，在GLUE基準測試中保持BERT 95%以上的性能。本模型專為問答任務微調。

問答系統

Transformers

英語開源協議:Apache-2.0 #問答系統 #輕量級BERT #知識蒸餾

下載量 154.39k

發布時間 : 3/2/2022

模型概述

基於DistilBERT-base-uncased的微調模型，使用SQuAD v1.1數據集通過知識蒸餾訓練，適用於英語問答任務。

模型特點

高效輕量

相比原始BERT模型，參數量減少40%，推理速度提升60%

高性能

在GLUE基準測試中保持BERT 95%以上的性能表現

問答優化

專門針對SQuAD問答任務進行微調，在SQuAD v1.1上達到86.9 F1分數

模型能力

抽取式問答

文本理解

答案定位

使用案例

問答系統

基於文檔的問答

從給定文本中提取問題答案

在SQuAD v1.1數據集上達到86.9 F1分數

知識檢索

從知識庫中查找相關信息

🚀 DistilBERT基礎無大小寫區分微調SQuAD模型

DistilBERT基礎無大小寫區分微調SQuAD模型是基於知識蒸餾技術，在SQuAD v1.1數據集上對DistilBERT進行微調的模型。它在保持較高性能的同時，具有更小的參數規模和更快的推理速度，可用於英文問答任務。

🚀 快速開始

使用以下代碼開始使用該模型：

基礎用法

>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')

>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example     of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """

>>> result = question_answerer(question="What is a good example of a question answering dataset?",     context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)

Answer: 'SQuAD dataset', score: 0.4704, start: 147, end: 160

高級用法

PyTorch

from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

answer_start_index = torch.argmax(outputs.start_logits)
answer_end_index = torch.argmax(outputs.end_logits)

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)

TensorFlow

from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased-distilled-squad")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)

✨ 主要特性

輕量高效：DistilBERT模型參數比 bert-base-uncased 少40%，運行速度快60%，同時在GLUE語言理解基準測試中保留了BERT超過95%的性能。
微調優化：該模型是 DistilBERT-base-uncased 的微調版本，在 SQuAD v1.1 上進行了知識蒸餾微調。

📚 詳細文檔

模型詳情

DistilBERT模型在博客文章 Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT 和論文 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter 中被提出。它是一個小型、快速、低成本且輕量級的Transformer模型，通過蒸餾BERT基礎模型進行訓練。

屬性	詳情
開發者	Hugging Face
模型類型	基於Transformer的語言模型
語言	英文
許可證	Apache 2.0
相關模型	DistilBERT-base-uncased
更多信息資源	- 有關Distil*（包括此模型的一類壓縮模型）的更多信息，請參閱此倉庫 - 有關知識蒸餾和訓練過程的更多信息，請參閱 Sanh et al. (2019)

用途

該模型可用於問答任務。

濫用和超出範圍使用

該模型不應被用於故意為人們創造敵對或排斥性的環境。此外，該模型並非用於生成事實性或真實的人物或事件描述，因此使用該模型生成此類內容超出了其能力範圍。

風險、限制和偏差

⚠️ 重要提示

讀者應注意，此模型生成的語言可能會讓一些人感到不安或冒犯，並且可能會傳播歷史和當前的刻板印象。

大量研究已經探討了語言模型的偏差和公平性問題（例如，參見 Sheng et al. (2021) 和 Bender et al. (2021)）。模型生成的預測可能包含針對受保護類別、身份特徵以及敏感、社會和職業群體的令人不安和有害的刻板印象。

訓練

訓練數據

distilbert-base-uncased模型的訓練數據描述如下：

DistilBERT在與BERT相同的數據上進行預訓練，即 BookCorpus（一個包含11,038本未出版書籍的數據集）和英文維基百科（不包括列表、表格和標題）。

有關SQuAD v1.1數據集的更多信息，請參閱 SQuAD v1.1數據卡。

訓練過程

預處理

更多詳細信息請參閱 distilbert-base-uncased模型卡片。

預訓練

更多詳細信息請參閱 distilbert-base-uncased模型卡片。

評估

如模型倉庫中所述：

該模型在 [SQuAD v1.1] 開發集上的F1分數達到86.9（相比之下，Bert bert-base-uncased版本的F1分數為88.5）。

環境影響

可以使用 Lacoste et al. (2019) 中提出的機器學習影響計算器來估算碳排放。以下是基於相關論文給出的硬件類型和使用時長。請注意，這些細節僅針對DistilBERT的訓練，不包括在SQuAD上的微調。

屬性	詳情
硬件類型	8個16GB V100 GPU
使用時長	90小時
雲服務提供商	未知
計算區域	未知
碳排放	未知

技術規格

有關建模架構、目標、計算基礎設施和訓練細節的詳細信息，請參閱相關論文。

引用信息

@inproceedings{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  booktitle={NeurIPS EMC^2 Workshop},
  year={2019}
}

APA格式：

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.