模型概述
模型特點
模型能力
使用案例
🚀 DistilBERT基礎大小寫敏感蒸餾SQuAD模型
DistilBERT基礎大小寫敏感蒸餾SQuAD模型是基於DistilBERT進行微調的模型,可用於問答任務。它在保持較高性能的同時,具有更小的參數規模和更快的運行速度。
🚀 快速開始
使用以下代碼開始使用該模型:
>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """
>>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)
Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160
以下是在PyTorch中使用該模型的方法:
from transformers import DistilBertTokenizer, DistilBertModel
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')
model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
print(outputs)
在TensorFlow中的使用方法如下:
from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)
answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)
✨ 主要特性
- DistilBERT模型:DistilBERT模型在博客文章 Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT 和論文 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter 中被提出。它是一個小型、快速、低成本且輕量級的Transformer模型,通過蒸餾BERT基礎模型進行訓練。與 bert-base-uncased 相比,它的參數減少了40%,運行速度提高了60%,同時在GLUE語言理解基準測試中保留了BERT超過95%的性能。
- 微調模型:此模型是 DistilBERT-base-cased 的微調檢查點,使用 SQuAD v1.1 上的知識蒸餾(第二步)進行了微調。
📦 安裝指南
文檔未提供安裝步驟,跳過該章節。
💻 使用示例
基礎用法
>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """
>>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)
Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160
高級用法
以下是在不同深度學習框架中使用該模型的示例,可用於更復雜的場景:
PyTorch
from transformers import DistilBertTokenizer, DistilBertModel
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')
model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
print(outputs)
TensorFlow
from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)
answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)
📚 詳細文檔
用途
該模型可用於問答任務。
誤用和超出範圍的使用
該模型不應被用於故意為人們創造敵對或排斥性的環境。此外,該模型並非用於對人物或事件進行事實性或真實性的表述,因此使用該模型生成此類內容超出了其能力範圍。
風險、侷限性和偏差
⚠️ 重要提示
讀者應注意,該模型生成的語言可能會讓一些人感到不安或冒犯,並可能傳播歷史和當前的刻板印象。
大量研究已經探討了語言模型的偏差和公平性問題(例如,參見 Sheng et al. (2021) 和 Bender et al. (2021))。該模型生成的預測可能包含針對受保護類別、身份特徵以及敏感、社會和職業群體的令人不安和有害的刻板印象。例如:
>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
>>> context = r"""
... Alice is sitting on the bench. Bob is sitting next to her.
... """
>>> result = question_answerer(question="Who is the CEO?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)
Answer: 'Bob', score: 0.7527, start: 32, end: 35
用戶(直接用戶和下游用戶)應瞭解該模型的風險、偏差和侷限性。
訓練
訓練數據
distilbert-base-cased模型 使用與 distilbert-base-uncased模型 相同的數據進行訓練。distilbert-base-uncased模型 對其訓練數據的描述如下:
DistilBERT在與BERT相同的數據上進行預訓練,這些數據包括 BookCorpus(一個由11038本未出版書籍組成的數據集)和 英文維基百科(不包括列表、表格和標題)。
要了解有關SQuAD v1.1數據集的更多信息,請參閱 SQuAD v1.1數據卡片。
訓練過程
預處理
更多詳細信息請參閱 distilbert-base-cased模型卡片。
預訓練
更多詳細信息請參閱 distilbert-base-cased模型卡片。
評估
如 模型倉庫 中所討論的:
該模型在 [SQuAD v1.1] 開發集上達到了87.1的F1分數(作為對比,BERT bert-base-cased版本的F1分數為88.7)。
環境影響
可以使用 Lacoste et al. (2019) 中提出的 機器學習影響計算器 來估算碳排放。我們根據 相關論文 提供了所使用的硬件類型和時長。請注意,這些細節僅適用於DistilBERT的訓練,不包括使用SQuAD進行的微調。
屬性 | 詳情 |
---|---|
硬件類型 | 8個16GB V100 GPU |
使用時長 | 90小時 |
雲服務提供商 | 未知 |
計算區域 | 未知 |
碳排放 | 未知 |
技術規格
有關模型架構、目標、計算基礎設施和訓練細節的詳細信息,請參閱 相關論文。
引用信息
@inproceedings{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
booktitle={NeurIPS EMC^2 Workshop},
year={2019}
}
APA格式:
- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
模型卡片作者
此模型卡片由Hugging Face團隊編寫。
🔧 技術細節
文檔未提供技術實現細節,跳過該章節。
📄 許可證
該模型使用Apache 2.0許可證。









