đ MPT-7B-StoryWriter-65k+
MPT-7B-StoryWriter-65k+ is a model crafted for reading and writing fictional stories with extremely long context lengths. It addresses the need for handling extended narratives in the field of text generation, offering a powerful solution for storytellers and content creators.
đ Quick Start
MPT-7B-StoryWriter-65k+ was created by fine - tuning MPT-7B with a 65k - token context length on a filtered fiction subset of the books3 dataset. At inference, thanks to ALiBi, it can handle even more than 65k tokens. In our blogpost, we've demonstrated generations up to 84k tokens on a single node of 8 A100 - 80GB GPUs.
- License: Apache 2.0
- Trained by: MosaicML, following a modified decoder - only transformer architecture.
⨠Features
- Super Long Context: Capable of handling up to 65k tokens during training and can extrapolate beyond that at inference.
- Efficient Training Features: Incorporates features like FlashAttention (Dao et al. 2022), ALiBi, and QK LayerNorm.
đĻ Installation
This model requires that trust_remote_code=True
be passed to the from_pretrained
method because it uses a custom model architecture not yet part of the transformers
package.
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b-storywriter',
trust_remote_code=True
)
đģ Usage Examples
Basic Usage
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b-storywriter',
trust_remote_code=True
)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
from transformers import pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
import torch
with torch.autocast('cuda', dtype=torch.bfloat16):
print(
pipe('Here is a recipe for vegan banana bread:\n',
max_new_tokens=100,
do_sample=True,
use_cache=True))
Advanced Usage
import torch
import transformers
name = 'mosaicml/mpt-7b-storywriter'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'
config.init_device = 'cuda:0'
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
Increasing the maximum sequence length
import transformers
name = 'mosaicml/mpt-7b'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 83968
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
trust_remote_code=True
)
đ Documentation
đ§ Technical Details
Model Architecture
The architecture is a modification of a standard decoder - only transformer. It has the following modifications:
Hyperparameters
Property |
Details |
n_parameters |
6.7B |
n_layers |
32 |
n_heads |
32 |
d_model |
4096 |
vocab size |
50432 |
sequence length |
65536 |
Pre - Training
- Data: Tokenized using the [EleutherAI/gpt - neox - 20b](https://huggingface.co/EleutherAI/gpt - neox - 20b) tokenizer. For more details on the pretraining process, see [MPT - 7B](https://huggingface.co/mosaicml/mpt - 7b).
- Training Configuration: Trained on 8 A100 - 80GBs for about 2 days using the MosaicML Platform. Used sharded data parallelism with FSDP and the LION optimizer.
đ License
The model is licensed under Apache 2.0.
Community - Created Resources
These resources were not created by MosaicML, but they might be useful. The links do not imply an endorsement of the creators or their content.
Example Epilogue
The full text of the The Great Gatsby (67873 tokens) was fed to the model, followed by the text "EPILOGUE"
TheRealFitzgerald233:
...
So we beat on, boats against the current, borne back ceaselessly into the past
EPILOGUE
MPT - 7B - StoryWriter:
I have lived so long in the shadow of this dream that I may never see the bright sunlight of reality. I may look up some morning and find that I am six feet under, and that Gatsby, who gave me the dream and everything else, will have vanished behind me.
And so it was. I had hardly crossed the threshold of the old world when my eye caught the headlines of a newspaper, and I knew at once that my dream was over. I knew that Gatsby was dead.
And I knew that day that I didn't love her.
That night I took the last train for West Egg, and when I reached home I went directly to my attic and brought down all Gatsby's letters.
I have sorted them by date and placed them in the order in which they were written. They are very few in number.
They are dated between April of nineteen eighteen and January of nineteen nineteen, and the last one I have here is the one that told me he was dead.
There is something in that last letter that frightens me. It is the sense of a finality, of a completeness. I never heard of him after that, and I was glad.
But a few months ago I read somewhere that he had been seen in New York. He had become a great man.
And I knew that he had not changed at all.
Limitations and Biases
The following language is modified from [EleutherAI's GPT - NeoX - 20B](https://huggingface.co/EleutherAI/gpt - neox - 20b)
MPT - 7B - StoryWriter can produce factually incorrect output and should not be relied on for factually accurate information. It was trained on various public datasets. Despite efforts to clean the pretraining data, it may generate lewd, biased, or offensive outputs.
Acknowledgements
This model was finetuned by Alex Trott and the MosaicML NLP team.
MosaicML Platform
If you're interested in training and deploying your own MPT or LLMs on the MosaicML Platform, [sign up here](https://forms.mosaicml.com/demo?utm_source=huggingface&utm_medium=referral&utm_campaign=mpt - 7b).
Disclaimer
The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
Citation
Please cite this model using the following format:
@online{MosaicML2023Introducing,
author = {MosaicML NLP Team},
title = {Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs},
year = {2023},
url = {www.mosaicml.com/blog/mpt-7b},
note = {Accessed: 2023-03-28}, % change this date
urldate = {2023-03-28} % change this date
}