đ Splinter base model
Splinter-base is a pre - trained model for few - shot question answering. It was trained in a self - supervised way and can use publicly available data.
đ Quick Start
The Splinter - base model is the pre - trained model presented in the paper [Few - Shot Question Answering by Pretraining Span Selection](https://aclanthology.org/2021.acl - long.239/) (at ACL 2021). You can find its original repository here. Note that this model is case - sensitive.
â ī¸ Important Note
This model doesn't contain the pre - trained weights for the QASS layer (see paper for details). So, the QASS layer is randomly initialized when loading the model. For the model with those weights, see [tau/splinter - base - qass](https://huggingface.co/tau/splinter - base - qass).
⨠Features
- Trained in a self - supervised fashion for few - shot question answering.
- Pretrained with the Recurring Span Selection (RSS) objective.
- Defines the Question - Aware Span selection (QASS) layer for multiple predictions.
đ Documentation
Model description
Splinter is a model pretrained in a self - supervised way for few - shot question answering. It was pretrained on raw texts only, without any human labeling. An automatic process was used to generate inputs and labels from the texts, allowing it to use a large amount of publicly available data.
More precisely, it was pretrained with the Recurring Span Selection (RSS) objective, which mimics the span selection process in extractive question answering. Given a text, clusters of recurring spans (n - grams that appear more than once in the text) are first identified. For each such cluster, all of its instances but one are replaced with a special [QUESTION]
token, and the model should select the correct (i.e., unmasked) span for each masked one. The model also defines the Question - Aware Span selection (QASS) layer, which selects spans based on a specific question for multiple predictions.
Intended uses & limitations
The main use of this model is few - shot extractive QA.
Pretraining
The model was pretrained on a v3 - 8 TPU for 2.4M steps. The training data is based on Wikipedia and BookCorpus. See the paper for more details.
Property |
Details |
Model Type |
Pretrained model for few - shot question answering |
Training Data |
Based on Wikipedia and BookCorpus |
BibTeX entry and citation info
@inproceedings{ram-etal-2021-shot,
title = "Few-Shot Question Answering by Pretraining Span Selection",
author = "Ram, Ori and
Kirstain, Yuval and
Berant, Jonathan and
Globerson, Amir and
Levy, Omer",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.239",
doi = "10.18653/v1/2021.acl-long.239",
pages = "3066--3079",
}
đ License
This model is licensed under the apache - 2.0 license.