Duo - distilled open - source text generation model - Free for masked language modeling with excellent performance!

Duo Distilled

Developed by s-sahoo

DUO is a pre-trained model for text generation, which can be used for masked language modeling tasks. It is trained on the OpenWebText corpus and has good performance.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Masked Language Modeling #Text Generation #OpenWebText Training

Downloads 98.21k

Release Time : 4/13/2025

Model Overview

DUO is a pre-trained model based on diffusion duality, mainly used for text generation tasks and is particularly good at masked language modeling.

Model Features

Efficient Distillation

Adopt distillation technology to reduce the model size while maintaining performance

Long Context Processing

Support a context length of 1024 tokens

Open Data Training

Train using the OpenWebText corpus

Model Capabilities

Text Generation

Masked Language Modeling

Use Cases

Text Generation

Auto-completion

Automatically generate coherent text based on the context

Text Repair

Repair text with missing or incorrect parts

🚀 DUO: A Text Generation Model

DUO is a pre - trained model for masked language modeling, offering high - quality text generation capabilities.

🚀 Quick Start

To use the pre - trained model for masked language modeling, use the following snippet:

from transformers import AutoModelForMaskedLM, AutoTokenizer

# See the `MDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForMaskedLM.from_pretrained('s - sahoo/duo - distilled')

For a hands - on example, check out this [Colab notebook](https://colab.research.google.com/drive/1Sf7R - dqdR6gq - H8nyZ9E3ZkyvqMTqcwq?usp=sharing). For more information and implementation details, visit our github repository: [DUO](https://github.com/s - sahoo/duo)

✨ Features

Text Generation: Ideal for masked language modeling tasks.
Context Length: The model has a context length of 1024.
Model Size: Similar in size to GPT2 - medium with approximately 130 million non - embedding parameters.

📚 Documentation

Model Details

The model, which has a context length of 1024 and is similar in size to GPT2 - medium with approximately 130 million non - embedding parameters, was trained for 1M steps on the OpenWebText corpus.

For more details, please see our paper: The Diffusion Duality.

Project page: https://s - sahoo.com/duo

📄 License

This project is licensed under the apache - 2.0 license.

📖 Citation

Please cite our work using the bibtex below:

BibTeX:

@inproceedings{
sahoo2025the,
title={The Diffusion Duality},
author={Subham Sekhar Sahoo and Justin Deschenaux and Aaron Gokaslan and Guanghan Wang and Justin T Chiu and Volodymyr Kuleshov},
booktitle={ICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy},
year={2025},
url={https://openreview.net/forum?id=CB0Ub2yXjC}
}