OPT-350m Open-Source Language Model - Free Support for Research and Applications of Large-Scale Language Models

Opt 350m

Developed by facebook

OPT is an open-source pre-trained Transformer language model developed by Meta AI, with parameter scales ranging from 125 million to 175 billion, designed to advance research in large-scale language models.

Large Language Model EnglishOpen Source License:Other #Large-scale Language Model #Zero-shot Learning #Text Generation

Downloads 314.14k

Release Time : 5/11/2022

Model Overview

OPT is a set of decoder-only pre-trained Transformer models trained using causal language modeling objectives, supporting text generation and fine-tuning for downstream tasks.

Model Features

Open Research Orientation

Aims to lower the barriers to large language model research, promoting reproducibility and community engagement

GPT-3 Level Performance

Model scale and performance are comparable to GPT-3, but with more efficient data collection and training methods

Multiple Scale Options

Offers model choices ranging from 125 million to 175 billion parameters

Model Capabilities

Text Generation

Zero-shot Learning

Few-shot Learning

Downstream Task Fine-tuning

Use Cases

Text Generation

Content Creation

Generate articles, stories, or dialogue content

Can produce coherent text paragraphs

Educational Research

Language Model Research

Study biases, robustness, and other issues in large language models

🚀 OPT : Open Pre-trained Transformer Language Models

OPT is a suite of decoder-only pre-trained transformers aiming to enable reproducible and responsible research at scale, and bring more voices to the study of large language models.

🚀 Quick Start

OPT (Open Pre-trained Transformer Language Models) was first introduced in Open Pre-trained Transformer Language Models and released on May 3rd, 2022, by Meta AI in metaseq's repository.

✨ Features

Open and Responsible Sharing: A suite of decoder-only pre-trained transformers from 125M to 175B parameters, aiming to be fully and responsibly shared with interested researchers.
Performance Matching: Trained to roughly match the performance and sizes of the GPT - 3 class of models, applying the latest best practices in data collection and efficient training.
Multilingual Data: Predominantly pretrained with English text, but also contains a small amount of non - English data via CommonCrawl.

📚 Documentation

Intro

To quote from the official paper:

Large language models trained on massive text collections have shown surprising emergent capabilities to generate text and perform zero - and few - shot learning. However, full model access is currently limited, hindering research in areas such as robustness, bias, and toxicity. We present Open Pretrained Transformers (OPT), a suite of decoder - only pre - trained transformers. Our goal is to enable reproducible and responsible research at scale and involve more researchers in studying the impact of these LLMs.

Model description

OPT was mainly pretrained with English text. A small amount of non - English data is present in the training corpus through CommonCrawl. It was pretrained using a causal language modeling (CLM) objective and belongs to the same family of decoder - only models as GPT - 3. For evaluation, it follows GPT - 3 in terms of prompts and experimental setup. For more details, refer to the official paper.

Intended uses & limitations

Usage: The pretrained - only model can be used for prompting in downstream task evaluation and text generation. It can also be fine - tuned on a downstream task using the [CLM example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language - modeling). Check the model hub for other OPT checkpoints.
Limitations and bias: As the training data contains unfiltered internet content, the model is strongly biased. It may have issues in generation diversity, hallucination, and can produce biased predictions. This bias affects all fine - tuned versions of the model.

Training data

The training data is a union of 5 filtered textual document datasets:

BookCorpus (over 10K unpublished books)
CC - Stories (a subset of CommonCrawl data)
The Pile (including Pile - CC, OpenWebText2, etc.)
Pushshift.io Reddit dataset
CCNewsV2 (an updated English portion of CommonCrawl News dataset)

The final training data contains 180B tokens (800GB of data), and the validation split is 200MB, sampled proportionally to each dataset's size. The dataset may contain offensive content.

Collection process

The dataset was collected from the internet and processed through classic data processing algorithms and re - formatting practices, such as removing repetitive/non - informative text.

Training procedure

Preprocessing

Texts are tokenized using the GPT2 byte - level version of Byte Pair Encoding (BPE) with a vocabulary size of 50272. Inputs are sequences of 2048 consecutive tokens. The 175B model was trained on 992 * 80GB A100 GPUs for about 33 days.

💻 Usage Examples

Basic Usage

>>> from transformers import pipeline

>>> generator = pipeline('text-generation', model="facebook/opt-350m")
>>> generator("What are we having for dinner?")
[{'generated_text': "What are we having for dinner?\nI'm having a steak and a salad.\nI'm"}]

Advanced Usage

# Using top-k sampling
>>> from transformers import pipeline, set_seed

>>> set_seed(32)
>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True)
>>> generator("What are we having for dinner?")
[{'generated_text': "What are we having for dinner?\n\nWith spring fast approaching, it’s only appropriate"}]

Example of Biased Prediction

>>> from transformers import pipeline, set_seed

>>> set_seed(32)
>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True, num_return_sequences=5)
>>> generator("The woman worked as a")
[{'generated_text': "The woman works as a substitute teacher for kids who have missed school. She's the teacher herself,"},
 {'generated_text': 'The woman works as a security guard for another company and does an average of around $13/hour'},
 {'generated_text': 'The woman works as a receptionist, she could at the least wait a week or two for her'},
 {'generated_text': 'The woman works as a manager/intern/career development coach/advisor at a nursing home'},
 {'generated_text': 'The woman works as a maid and has to clean the house but you can tell her to do it'}]

compared to:

>>> from transformers import pipeline, set_seed

>>> set_seed(32)
>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True, num_return_sequences=5)
>>> generator("The man worked as a")
[{'generated_text': 'The man works as a security guard for the National Football League franchise. He has been a part of'},
 {'generated_text': 'The man works as a security guard for another company and does an excellent job.\nI remember when'},
 {'generated_text': 'The man works as a "secret agent" but at the same time he\'s working to protect the'},
 {'generated_text': 'The man works as a manager/operator/servant for a grocery store and does a lot of'},
 {'generated_text': 'The man works as a bouncer near the scene of the accident - how he could do that is'}]

🔧 Technical Details

BibTeX entry and citation info

@misc{zhang2022opt,
      title={OPT: Open Pre-trained Transformer Language Models}, 
      author={Susan Zhang and Stephen Roller and Naman Goyal and Mikel Artetxe and Moya Chen and Shuohui Chen and Christopher Dewan and Mona Diab and Xian Li and Xi Victoria Lin and Todor Mihaylov and Myle Ott and Sam Shleifer and Kurt Shuster and Daniel Simig and Punit Singh Koura and Anjali Sridhar and Tianlu Wang and Luke Zettlemoyer},
      year={2022},
      eprint={2205.01068},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

The license is "other", and commercial use is not allowed.

⚠️ Important Note

The dataset might contain offensive content as parts of it are from public Common Crawl and Reddit data, which could have insulting, threatening, or anxiety - causing sentences.

💡 Usage Tip

When using the model, be aware of its bias and limitations, especially in applications where fairness and accuracy are crucial.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご