🚀 Sharded version of Mistral-7B-Instruct-v0.1
This is a sharded version of Mistral-7B-Instruct-v0.1, designed for use when your CPU memory is limited. It enables you to leverage the power of this model even with constrained resources.
🚀 Quick Start
The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is an instruction fine - tuned version of the Mistral-7B-v0.1 generative text model. It was fine - tuned using a variety of publicly available conversation datasets.
For comprehensive details about this model, please refer to our release blog post.
✨ Features
- Sharded for Limited Memory: Allows usage when CPU memory is restricted.
- Instruction Fine - Tuned: Based on Mistral-7B-v0.1 and fine - tuned for better instruction following.
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"
encodeds = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
🔧 Technical Details
Instruction format
To take advantage of instruction fine - tuning, your prompt should be enclosed by [INST]
and [\INST]
tokens. The very first instruction should start with a begin - of - sentence id, while subsequent instructions should not. The assistant's generation will be terminated by the end - of - sentence token id.
Model Architecture
This instruction model is built upon Mistral-7B-v0.1, a transformer model with the following architectural features:
- Grouped - Query Attention: Optimizes attention mechanisms for better performance.
- Sliding - Window Attention: Enables efficient processing of long sequences.
- Byte - fallback BPE tokenizer: Handles text encoding effectively.
📄 License
This project is licensed under the Apache-2.0 license.
The Mistral AI Team
Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie - Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.