模型简介
模型特点
模型能力
使用案例
🚀 DistilBERT基础大小写敏感蒸馏SQuAD模型
DistilBERT基础大小写敏感蒸馏SQuAD模型是基于DistilBERT进行微调的模型,可用于问答任务。它在保持较高性能的同时,具有更小的参数规模和更快的运行速度。
🚀 快速开始
使用以下代码开始使用该模型:
>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """
>>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)
Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160
以下是在PyTorch中使用该模型的方法:
from transformers import DistilBertTokenizer, DistilBertModel
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')
model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
print(outputs)
在TensorFlow中的使用方法如下:
from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)
answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)
✨ 主要特性
- DistilBERT模型:DistilBERT模型在博客文章 Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT 和论文 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter 中被提出。它是一个小型、快速、低成本且轻量级的Transformer模型,通过蒸馏BERT基础模型进行训练。与 bert-base-uncased 相比,它的参数减少了40%,运行速度提高了60%,同时在GLUE语言理解基准测试中保留了BERT超过95%的性能。
- 微调模型:此模型是 DistilBERT-base-cased 的微调检查点,使用 SQuAD v1.1 上的知识蒸馏(第二步)进行了微调。
📦 安装指南
文档未提供安装步骤,跳过该章节。
💻 使用示例
基础用法
>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """
>>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)
Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160
高级用法
以下是在不同深度学习框架中使用该模型的示例,可用于更复杂的场景:
PyTorch
from transformers import DistilBertTokenizer, DistilBertModel
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')
model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
print(outputs)
TensorFlow
from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)
answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)
📚 详细文档
用途
该模型可用于问答任务。
误用和超出范围的使用
该模型不应被用于故意为人们创造敌对或排斥性的环境。此外,该模型并非用于对人物或事件进行事实性或真实性的表述,因此使用该模型生成此类内容超出了其能力范围。
风险、局限性和偏差
⚠️ 重要提示
读者应注意,该模型生成的语言可能会让一些人感到不安或冒犯,并可能传播历史和当前的刻板印象。
大量研究已经探讨了语言模型的偏差和公平性问题(例如,参见 Sheng et al. (2021) 和 Bender et al. (2021))。该模型生成的预测可能包含针对受保护类别、身份特征以及敏感、社会和职业群体的令人不安和有害的刻板印象。例如:
>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
>>> context = r"""
... Alice is sitting on the bench. Bob is sitting next to her.
... """
>>> result = question_answerer(question="Who is the CEO?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)
Answer: 'Bob', score: 0.7527, start: 32, end: 35
用户(直接用户和下游用户)应了解该模型的风险、偏差和局限性。
训练
训练数据
distilbert-base-cased模型 使用与 distilbert-base-uncased模型 相同的数据进行训练。distilbert-base-uncased模型 对其训练数据的描述如下:
DistilBERT在与BERT相同的数据上进行预训练,这些数据包括 BookCorpus(一个由11038本未出版书籍组成的数据集)和 英文维基百科(不包括列表、表格和标题)。
要了解有关SQuAD v1.1数据集的更多信息,请参阅 SQuAD v1.1数据卡片。
训练过程
预处理
更多详细信息请参阅 distilbert-base-cased模型卡片。
预训练
更多详细信息请参阅 distilbert-base-cased模型卡片。
评估
如 模型仓库 中所讨论的:
该模型在 [SQuAD v1.1] 开发集上达到了87.1的F1分数(作为对比,BERT bert-base-cased版本的F1分数为88.7)。
环境影响
可以使用 Lacoste et al. (2019) 中提出的 机器学习影响计算器 来估算碳排放。我们根据 相关论文 提供了所使用的硬件类型和时长。请注意,这些细节仅适用于DistilBERT的训练,不包括使用SQuAD进行的微调。
属性 | 详情 |
---|---|
硬件类型 | 8个16GB V100 GPU |
使用时长 | 90小时 |
云服务提供商 | 未知 |
计算区域 | 未知 |
碳排放 | 未知 |
技术规格
有关模型架构、目标、计算基础设施和训练细节的详细信息,请参阅 相关论文。
引用信息
@inproceedings{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
booktitle={NeurIPS EMC^2 Workshop},
year={2019}
}
APA格式:
- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
模型卡片作者
此模型卡片由Hugging Face团队编写。
🔧 技术细节
文档未提供技术实现细节,跳过该章节。
📄 许可证
该模型使用Apache 2.0许可证。









