Model Overview
Model Features
Model Capabilities
Use Cases
๐ OPT : Open Pre-trained Transformer Language Models
OPT is a suite of decoder-only pre-trained transformers aiming to enable reproducible and responsible research at scale, and bring more voices to the study of large language models.
๐ Quick Start
OPT (Open Pre-trained Transformer Language Models) was first introduced in Open Pre-trained Transformer Language Models and released on May 3rd, 2022, by Meta AI in metaseq's repository.
โจ Features
- Open and Responsible Sharing: A suite of decoder-only pre-trained transformers from 125M to 175B parameters, aiming to be fully and responsibly shared with interested researchers.
- Performance Matching: Trained to roughly match the performance and sizes of the GPT - 3 class of models, applying the latest best practices in data collection and efficient training.
- Multilingual Data: Predominantly pretrained with English text, but also contains a small amount of non - English data via CommonCrawl.
๐ Documentation
Intro
To quote from the official paper:
Large language models trained on massive text collections have shown surprising emergent capabilities to generate text and perform zero - and few - shot learning. However, full model access is currently limited, hindering research in areas such as robustness, bias, and toxicity. We present Open Pretrained Transformers (OPT), a suite of decoder - only pre - trained transformers. Our goal is to enable reproducible and responsible research at scale and involve more researchers in studying the impact of these LLMs.
Model description
OPT was mainly pretrained with English text. A small amount of non - English data is present in the training corpus through CommonCrawl. It was pretrained using a causal language modeling (CLM) objective and belongs to the same family of decoder - only models as GPT - 3. For evaluation, it follows GPT - 3 in terms of prompts and experimental setup. For more details, refer to the official paper.
Intended uses & limitations
- Usage: The pretrained - only model can be used for prompting in downstream task evaluation and text generation. It can also be fine - tuned on a downstream task using the [CLM example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language - modeling). Check the model hub for other OPT checkpoints.
- Limitations and bias: As the training data contains unfiltered internet content, the model is strongly biased. It may have issues in generation diversity, hallucination, and can produce biased predictions. This bias affects all fine - tuned versions of the model.
Training data
The training data is a union of 5 filtered textual document datasets:
- BookCorpus (over 10K unpublished books)
- CC - Stories (a subset of CommonCrawl data)
- The Pile (including Pile - CC, OpenWebText2, etc.)
- Pushshift.io Reddit dataset
- CCNewsV2 (an updated English portion of CommonCrawl News dataset)
The final training data contains 180B tokens (800GB of data), and the validation split is 200MB, sampled proportionally to each dataset's size. The dataset may contain offensive content.
Collection process
The dataset was collected from the internet and processed through classic data processing algorithms and re - formatting practices, such as removing repetitive/non - informative text.
Training procedure
Preprocessing
Texts are tokenized using the GPT2 byte - level version of Byte Pair Encoding (BPE) with a vocabulary size of 50272. Inputs are sequences of 2048 consecutive tokens. The 175B model was trained on 992 * 80GB A100 GPUs for about 33 days.
๐ป Usage Examples
Basic Usage
>>> from transformers import pipeline
>>> generator = pipeline('text-generation', model="facebook/opt-350m")
>>> generator("What are we having for dinner?")
[{'generated_text': "What are we having for dinner?\nI'm having a steak and a salad.\nI'm"}]
Advanced Usage
# Using top-k sampling
>>> from transformers import pipeline, set_seed
>>> set_seed(32)
>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True)
>>> generator("What are we having for dinner?")
[{'generated_text': "What are we having for dinner?\n\nWith spring fast approaching, itโs only appropriate"}]
Example of Biased Prediction
>>> from transformers import pipeline, set_seed
>>> set_seed(32)
>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True, num_return_sequences=5)
>>> generator("The woman worked as a")
[{'generated_text': "The woman works as a substitute teacher for kids who have missed school. She's the teacher herself,"},
{'generated_text': 'The woman works as a security guard for another company and does an average of around $13/hour'},
{'generated_text': 'The woman works as a receptionist, she could at the least wait a week or two for her'},
{'generated_text': 'The woman works as a manager/intern/career development coach/advisor at a nursing home'},
{'generated_text': 'The woman works as a maid and has to clean the house but you can tell her to do it'}]
compared to:
>>> from transformers import pipeline, set_seed
>>> set_seed(32)
>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True, num_return_sequences=5)
>>> generator("The man worked as a")
[{'generated_text': 'The man works as a security guard for the National Football League franchise. He has been a part of'},
{'generated_text': 'The man works as a security guard for another company and does an excellent job.\nI remember when'},
{'generated_text': 'The man works as a "secret agent" but at the same time he\'s working to protect the'},
{'generated_text': 'The man works as a manager/operator/servant for a grocery store and does a lot of'},
{'generated_text': 'The man works as a bouncer near the scene of the accident - how he could do that is'}]
๐ง Technical Details
BibTeX entry and citation info
@misc{zhang2022opt,
title={OPT: Open Pre-trained Transformer Language Models},
author={Susan Zhang and Stephen Roller and Naman Goyal and Mikel Artetxe and Moya Chen and Shuohui Chen and Christopher Dewan and Mona Diab and Xian Li and Xi Victoria Lin and Todor Mihaylov and Myle Ott and Sam Shleifer and Kurt Shuster and Daniel Simig and Punit Singh Koura and Anjali Sridhar and Tianlu Wang and Luke Zettlemoyer},
year={2022},
eprint={2205.01068},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
๐ License
The license is "other", and commercial use is not allowed.
โ ๏ธ Important Note
The dataset might contain offensive content as parts of it are from public Common Crawl and Reddit data, which could have insulting, threatening, or anxiety - causing sentences.
๐ก Usage Tip
When using the model, be aware of its bias and limitations, especially in applications where fairness and accuracy are crucial.

