MPT-7B Storywriter: An Open-Source Fictional Story Generation Model - Supports Ultra-Long Text Reading and Creation

Home

Mpt 7b Storywriter

Developed by mosaicml

A fiction generation model designed for long-form text reading and creation, supporting 65k+ tokens context length

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Long-form text generation #Fiction writing #65k context

Downloads 769

Release Time : 5/4/2023

Model Overview

A long-form text generation model fine-tuned based on MPT-7B, focusing on novel writing and long-text comprehension, using ALiBi technology to break through context length limitations

Model Features

Extended context processing

Supports 65k tokens context window, extendable to 84k+ tokens during inference

ALiBi position encoding

Utilizes linear bias attention technology for dynamic context expansion

Efficient training optimization

Integrates FlashAttention, QK layer normalization and other technologies to enhance training efficiency

Business-friendly license

Apache 2.0 license permits commercial use

Model Capabilities

Long-form text generation

Fiction writing

Story continuation

Extended text comprehension

Use Cases

Creative writing

Automatic novel generation

Generates complete novel content based on opening paragraphs

Demonstrated ability to generate coherent texts up to 84k tokens

Story continuation

Generates follow-up plots based on classic literary works (e.g., 'The Great Gatsby')

The model can maintain the original writing style and generate reasonable endings

Long-text analysis

Extended document processing

Content analysis and summary generation for entire novels

🚀 MPT-7B-StoryWriter-65k+

MPT-7B-StoryWriter-65k+ is a model crafted for reading and writing fictional stories with extremely long context lengths. It addresses the need for handling extended narratives in the field of text generation, offering a powerful solution for storytellers and content creators.

🚀 Quick Start

MPT-7B-StoryWriter-65k+ was created by fine - tuning MPT-7B with a 65k - token context length on a filtered fiction subset of the books3 dataset. At inference, thanks to ALiBi, it can handle even more than 65k tokens. In our blogpost, we've demonstrated generations up to 84k tokens on a single node of 8 A100 - 80GB GPUs.

License: Apache 2.0
Trained by: MosaicML, following a modified decoder - only transformer architecture.

✨ Features

Super Long Context: Capable of handling up to 65k tokens during training and can extrapolate beyond that at inference.
Efficient Training Features: Incorporates features like FlashAttention (Dao et al. 2022), ALiBi, and QK LayerNorm.

📦 Installation

This model requires that trust_remote_code=True be passed to the from_pretrained method because it uses a custom model architecture not yet part of the transformers package.

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-storywriter',
  trust_remote_code=True
)

💻 Usage Examples

Basic Usage

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-storywriter',
  trust_remote_code=True
)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
from transformers import pipeline

pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

import torch
with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Here is a recipe for vegan banana bread:\n',
            max_new_tokens=100,
            do_sample=True,
            use_cache=True))

Advanced Usage

Using the optimized triton implementation of FlashAttention

import torch
import transformers

name = 'mosaicml/mpt-7b-storywriter'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'
config.init_device = 'cuda:0' # For fast initialization directly on GPU!

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  torch_dtype=torch.bfloat16, # Load model weights in bfloat16
  trust_remote_code=True
)

Increasing the maximum sequence length

import transformers

name = 'mosaicml/mpt-7b'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 83968 # (input + output) tokens can now be up to 83968

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  trust_remote_code=True
)

📚 Documentation

Blog post: Introducing MPT-7B: A New Standard for Open - Source, Commercially Usable LLMs
[Codebase (mosaicml/llm - foundry repo)](https://github.com/mosaicml/llm - foundry/)
For questions, feel free to contact us via the MosaicML Community Slack!

🔧 Technical Details

Model Architecture

The architecture is a modification of a standard decoder - only transformer. It has the following modifications:

Uses FlashAttention
Uses ALiBi (Attention with Linear Biases) and does not use positional embeddings
Does not use biases

Hyperparameters

Property	Details
n_parameters	6.7B
n_layers	32
n_heads	32
d_model	4096
vocab size	50432
sequence length	65536

Pre - Training

Data: Tokenized using the [EleutherAI/gpt - neox - 20b](https://huggingface.co/EleutherAI/gpt - neox - 20b) tokenizer. For more details on the pretraining process, see [MPT - 7B](https://huggingface.co/mosaicml/mpt - 7b).
Training Configuration: Trained on 8 A100 - 80GBs for about 2 days using the MosaicML Platform. Used sharded data parallelism with FSDP and the LION optimizer.

📄 License

The model is licensed under Apache 2.0.

Community - Created Resources

These resources were not created by MosaicML, but they might be useful. The links do not imply an endorsement of the creators or their content.

Oobabooga Running MPT - 7B - Storywriter
NEW MPT - 7B - StoryWriter CRUSHES GPT - 4! - Has a long section on running locally using Oobabooga

Example Epilogue

The full text of the The Great Gatsby (67873 tokens) was fed to the model, followed by the text "EPILOGUE"

TheRealFitzgerald233:

...

So we beat on, boats against the current, borne back ceaselessly into the past

EPILOGUE

MPT - 7B - StoryWriter:

I have lived so long in the shadow of this dream that I may never see the bright sunlight of reality. I may look up some morning and find that I am six feet under, and that Gatsby, who gave me the dream and everything else, will have vanished behind me.

And so it was. I had hardly crossed the threshold of the old world when my eye caught the headlines of a newspaper, and I knew at once that my dream was over. I knew that Gatsby was dead.

And I knew that day that I didn't love her.

That night I took the last train for West Egg, and when I reached home I went directly to my attic and brought down all Gatsby's letters.

I have sorted them by date and placed them in the order in which they were written. They are very few in number.

They are dated between April of nineteen eighteen and January of nineteen nineteen, and the last one I have here is the one that told me he was dead.

There is something in that last letter that frightens me. It is the sense of a finality, of a completeness. I never heard of him after that, and I was glad.

But a few months ago I read somewhere that he had been seen in New York. He had become a great man.

And I knew that he had not changed at all.

Limitations and Biases

The following language is modified from [EleutherAI's GPT - NeoX - 20B](https://huggingface.co/EleutherAI/gpt - neox - 20b)

MPT - 7B - StoryWriter can produce factually incorrect output and should not be relied on for factually accurate information. It was trained on various public datasets. Despite efforts to clean the pretraining data, it may generate lewd, biased, or offensive outputs.

Acknowledgements

This model was finetuned by Alex Trott and the MosaicML NLP team.

MosaicML Platform

If you're interested in training and deploying your own MPT or LLMs on the MosaicML Platform, [sign up here](https://forms.mosaicml.com/demo?utm_source=huggingface&utm_medium=referral&utm_campaign=mpt - 7b).

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Citation

Please cite this model using the following format:

@online{MosaicML2023Introducing,
    author    = {MosaicML NLP Team},
    title     = {Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs},
    year      = {2023},
    url       = {www.mosaicml.com/blog/mpt-7b},
    note      = {Accessed: 2023-03-28}, % change this date
    urldate   = {2023-03-28} % change this date
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご