đ MPT-7B-Instruct-8k
MPT-7B-Instruct-8k is a model designed for long-form instruction following, particularly for question-answering and summarization of longer documents. It offers high performance in handling extended text tasks.
đ Quick Start
MPT-7B-Instruct-8k is built by finetuning MPT-7B-8k on multiple datasets, including Dolly HHRLHF, Competition Math, etc.
⨠Features
- Long-Form Instruction Following: Ideal for question-answering and summarization of longer documents.
- Based on Multiple Datasets: Trained on a diverse set of datasets for better generalization.
- Modified Architecture: Follows a modified decoder-only transformer architecture.
đĻ Installation
This model is best used with the MosaicML llm-foundry repository for training and finetuning.
đģ Usage Examples
Basic Usage
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b-instruct-8k',
trust_remote_code=True
)
Advanced Usage
To use the optimized triton implementation of FlashAttention:
import torch
import transformers
name = 'mosaicml/mpt-7b-instruct-8k'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'
config.init_device = 'cuda:0'
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
To increase the maximum sequence length:
import transformers
name = 'mosaicml/mpt-7b-instruct-8k'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 16384
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
trust_remote_code=True
)
To use the model in a text-generation pipeline:
from transformers import AutoTokenizer, pipeline
import torch
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-7b-8k')
with torch.autocast('cuda', dtype=torch.bfloat16):
inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
print(
pipe('Here is a recipe for vegan banana bread:\n',
max_new_tokens=100,
do_sample=True,
use_cache=True))
đ Documentation
đ§ Technical Details
Model Architecture
The architecture is a modification of a standard decoder-only transformer. It uses FlashAttention, ALiBi (Attention with Linear Biases), and does not use positional embeddings or biases.
Hyperparameters
Property |
Details |
n_parameters |
6.7B |
n_layers |
32 |
n_heads |
32 |
d_model |
4096 |
vocab size |
50432 |
sequence length |
2048 |
Data Mix
Data Source |
Number of Tokens in Source |
Proportion |
competition_math |
1.6 M |
3.66% |
cot_gsm8k |
3.36 M |
7.67% |
dialogsum |
0.1 M |
0.23% |
dolly_hhrlhf |
5.89 M |
13.43% |
duorc |
7.8 M |
17.80% |
qasper |
8.72 M |
19.90% |
quality |
11.29 M |
25.78% |
scrolls/summ_screen_fd |
4.97 M |
11.33% |
spider |
0.089 M |
0.20% |
Training Configuration
This model was trained on 8 80GB A100s for about 6.3 hours using the MosaicML Platform. It was trained with sharded data parallelism using FSDP and used the AdamW optimizer.
đ License
The model is licensed under Apache 2.0.
â ī¸ Important Note
MPT-7B-Instruct-8k can produce factually incorrect output, and should not be relied on to produce factually accurate information. It was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
đĄ Usage Tip
The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
Acknowledgements
This model was finetuned by the MosaicML NLP team.
MosaicML Platform
If you're interested in training and deploying your own MPT or LLMs on the MosaicML Platform, sign up here.
Citation
Please cite this model using the following format:
@online{MosaicML2023Introducing,
author = {MosaicML NLP Team},
title = {Introducing MPT-30B: Raising the bar
for open-source foundation models},
year = {2023},
url = {www.mosaicml.com/blog/mpt-30b},
note = {Accessed: 2023-06-22},
urldate = {2023-06-22}
}