đ Transformers
A library for using pre - trained models for masked language modeling.
đ Quick Start
To use the pre - trained model for masked language modeling, use the following snippet:
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained('gpt2')
model_name = 'kuleshov-group/mdlm-owt'
model = AutoModelForMaskedLM.from_pretrained(model_name)
For more details, please see our github repository: MDLM
⨠Features
- Model Structure: The model has a context length of
1024
and is similar in size to GPT2 - medium with approximately 130 million
non - embedding parameters.
- Training Process: It was trained using a forward diffusion process that generates inputs varying from fully masked to fully unmasked. The goal is to reconstruct the original input from these varying levels of masking and output logits in the process.
- Training Data: The training regimen comprised one million steps on the OpenWebText corpus, involving the processing of a total of
33 billion
tokens.
For more details, please see our paper: Simple and Effective Masked Diffusion Language Models.
đ Documentation
Model Details
The model, which has a context length of 1024
and is similar in size to GPT2 - medium with approximately 130 million
non - embedding parameters, was trained using a forward diffusion process that generates inputs varying from fully masked to fully unmasked. Its objective is to reconstruct the original input from these varying levels of masking, outputting logits in the process. The training regimen comprised one million steps on the OpenWebText corpus, involving the processing of a total of 33 billion
tokens.
Citation
Please cite our work using the bibtex below:
BibTeX:
@misc{sahoo2024simple,
title={Simple and Effective Masked Diffusion Language Models},
author={Subham Sekhar Sahoo and Marianne Arriola and Yair Schiff and Aaron Gokaslan and Edgar Marroquin and Justin T Chiu and Alexander Rush and Volodymyr Kuleshov},
year={2024},
eprint={2406.07524},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
APA:
@software{Sahoo_Simple_and_Effective_2024,
author = {Sahoo, Subham Sekhar and Arriola, Marianne and Schiff, Yair and Gokaslan, Aaron and Marroquin, Edgar and Chiu, Justin T and Rush, Alexander and Kuleshov, Volodymyr},
doi = {10.48550/arXiv.2406.07524},
month = jun,
title = {{Simple and Effective Masked Diffusion Language Models}},
version = {arXiv:2406.07524v1},
year = {2024}
}
Model Card Contact
Subham Sekhar Sahoo (ssahoo@cs.cornell.edu)
đ License
This project is licensed under the Apache - 2.0 license.