đ GENRE
The GENRE (Generative ENtity REtrieval) system is a powerful tool for entity retrieval tasks, leveraging a sequence-to-sequence approach based on the fine - tuned BART architecture.
The GENRE (Generative ENtity REtrieval) system, as introduced in Autoregressive Entity Retrieval, is implemented in PyTorch. In essence, GENRE adopts a sequence - to - sequence method for entity retrieval (such as entity linking). It is built upon the fine - tuned BART architecture. GENRE conducts retrieval by generating the unique entity name conditioned on the input text, using constrained beam search to ensure only valid identifiers are generated. The model was initially released in the facebookresearch/GENRE repository with fairseq
. The transformers
models are obtained through a conversion script similar to this.
This model was trained on the full training set of BLINK, which consists of 9M datapoints for entity - disambiguation grounded on Wikipedia.
đ Documentation
BibTeX entry and citation info
Please consider citing our works if you use code from this repository.
@inproceedings{decao2020autoregressive,
title={Autoregressive Entity Retrieval},
author={Nicola {De Cao} and Gautier Izacard and Sebastian Riedel and Fabio Petroni},
booktitle={International Conference on Learning Representations},
url={https://openreview.net/forum?id=5k8F6UU39V},
year={2021}
}
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/genre-linking-blink")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/genre-linking-blink").eval()
sentences = ["Einstein was a [START_ENT] German [END_ENT] physicist."]
outputs = model.generate(
**tokenizer(sentences, return_tensors="pt"),
num_beams=5,
num_return_sequences=5,
)
tokenizer.batch_decode(outputs, skip_special_tokens=True)
This code outputs the following top - 5 predictions (using constrained beam search):
['Germans',
'Germany',
'German Empire',
'Weimar Republic',
'Greeks']
đ License
Since no license information is provided in the original README, this section is skipped.
đ§ Technical Details
The GENRE system uses a sequence - to - sequence approach for entity retrieval. It is based on the fine - tuned BART architecture. The model was trained on the full training set of BLINK, which contains 9M datapoints for entity - disambiguation grounded on Wikipedia. It uses constrained beam search to generate valid entity identifiers conditioned on the input text.
Information Table
Property |
Details |
Model Type |
Generative ENtity REtrieval (GENRE) based on fine - tuned BART architecture |
Training Data |
Full training set of BLINK with 9M datapoints for entity - disambiguation grounded on Wikipedia |