Mixtral-8x7B-Instruct-v0.1-HF Open-Source Large Model - Outperforms Llama 2 70B in Performance and Generates Superb Content

Mixtral 8x7B Instruct V0.1 HF

Developed by LoneStriker

Mixtral-8x7B is a pre-trained generative sparse mixture of experts large language model that outperforms Llama 2 70B on most benchmarks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Sparse Mixture of Experts #Multilingual Instruction Following #High-Precision Generation

Downloads 45

Release Time : 12/11/2023

Model Overview

Mixtral-8x7B is a high-performance large language model supporting multilingual instruction following and text generation tasks.

Model Features

Sparse Mixture of Experts Architecture

Utilizes a sparse mixture of 8 expert models, delivering high-quality output while maintaining efficiency

Multilingual Support

Natively supports multiple languages including French, Italian, German, Spanish, and English

High Performance

Outperforms Llama 2 70B model on most benchmarks

Instruction Optimization

Specially optimized for instruction following, suitable for dialogue and task completion scenarios

Model Capabilities

Multilingual text generation

Instruction understanding and execution

Dialogue systems

Content creation

Use Cases

Dialogue Systems

Intelligent Assistant

Build multilingual intelligent assistants that understand and execute user instructions

Capable of generating coherent responses that follow instructions

Content Creation

Multilingual Content Generation

Generate marketing copy, articles, and other content in various languages

Produces fluent, contextually appropriate text

🚀 Hugging Face Transformers Conversion of Mixtral-8x7B-Instruct

The Mixtral-8x7B is a pretrained generative Sparse Mixture of Experts large language model. It outperforms Llama 2 70B on most tested benchmarks, offering high - performance language processing capabilities.

🚀 Quick Start

The Mixtral-8x7B Large Language Model (LLM) is a powerful pretrained generative Sparse Mixture of Experts. To understand its full potential, please read our release blog post.

✨ Features

High - performance: Outperforms Llama 2 70B on most benchmarks.
Instruction - following: Can follow specific instruction formats for better output.

⚠️ Warning

This repo contains weights that are compatible with vLLM serving of the model as well as Hugging Face transformers library. It is based on the original Mixtral torrent release, but the file format and parameter names are different. Please note that the model cannot (yet) be instantiated with HF.

📚 Documentation

Instruction format

This format must be strictly respected, otherwise the model will generate sub - optimal outputs.

The template used to build a prompt for the Instruct model is defined as follows:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow - up instruction [/INST]

Note that <s> and </s> are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.

As reference, here is the pseudo - code used to tokenize instructions during fine - tuning:

def tokenize(text):
    return tok.encode(text, add_special_tokens=False)

[BOS_ID] + 
tokenize("[INST]") + tokenize(USER_MESSAGE_1) + tokenize("[/INST]") +
tokenize(BOT_MESSAGE_1) + [EOS_ID] +
…
tokenize("[INST]") + tokenize(USER_MESSAGE_N) + tokenize("[/INST]") +
tokenize(BOT_MESSAGE_N) + [EOS_ID]

In the pseudo - code above, note that the tokenize method should not add a BOS or EOS token automatically, but should add a prefix space.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id)

text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Advanced Usage

In half - precision

Note float16 precision only works on GPU devices

Click to expand

+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(0)

text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Lower precision using (8 - bit & 4 - bit) using `bitsandbytes`

Click to expand

+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

+ model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Load the model with Flash Attention 2

Click to expand

+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

+ model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True)

text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔧 Technical Details

The Mixtral-8x7B Instruct model is a quick demonstration that the base model can be easily fine - tuned to achieve compelling performance. However, it does not have any moderation mechanisms. The Mistral AI Team is looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

📄 License

The model is released under the apache - 2.0 license.

👥 The Mistral AI Team

Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie - Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Mixtral 8x7B Instruct V0.1 HF

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Hugging Face Transformers Conversion of Mixtral-8x7B-Instruct

🚀 Quick Start

✨ Features

⚠️ Warning

📚 Documentation

Instruction format

💻 Usage Examples

Basic Usage

Advanced Usage

In half - precision

Lower precision using (8 - bit & 4 - bit) using bitsandbytes

Load the model with Flash Attention 2

🔧 Technical Details

📄 License

👥 The Mistral AI Team

Lower precision using (8 - bit & 4 - bit) using `bitsandbytes`