Mixtral-8x7B-v0.1 Open-Source AI Model - Outperforms Llama 2 70B in Performance to Boost Diverse Task Processing

Mixtral 8x7B V0.1

Developed by mistralai

Mixtral-8x7B is a pre-trained generative sparse mixture of experts model that outperforms Llama 2 70B on most benchmarks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Sparse Mixture of Experts #Multilingual Generation #High-performance LLM

Downloads 42.78k

Release Time : 12/1/2023

Model Overview

This is a large-scale multilingual language model that adopts a mixture of experts architecture, suitable for text generation tasks.

Model Features

Mixture of Experts Architecture

Utilizes a sparse mixture of experts design to enhance model efficiency

Multilingual Support

Supports five languages: French, Italian, German, Spanish, and English

High Performance

Outperforms Llama 2 70B model on most benchmarks

Model Capabilities

Multilingual Text Generation

Long Text Processing

Context Understanding

Use Cases

Text Generation

Content Creation

Automatically generates articles, stories, and other creative content

Dialogue Systems

Builds intelligent chatbots

Language Processing

Multilingual Translation

Supports translation tasks between multiple languages

🚀 Mixtral-8x7B Model Card

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mistral-8x7B outperforms Llama 2 70B on most benchmarks we tested.

For full details of this model, please read our release blog post.

🚀 Quick Start

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id)

text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Advanced Usage

By default, transformers will load the model in full precision. Therefore, you might be interested in further reducing the memory requirements to run the model through the optimizations we offer in the HF ecosystem.

In half - precision

Note float16 precision only works on GPU devices.

Click to expand

+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(0)

text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Lower precision using (8 - bit & 4 - bit) using `bitsandbytes`

Click to expand

+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

+ model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Load the model with Flash Attention 2

Click to expand

+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

+ model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True)

text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Important Note

This repo contains weights that are compatible with vLLM serving of the model as well as Hugging Face transformers library. It is based on the original Mixtral torrent release, but the file format and parameter names are different. Please note that the model cannot (yet) be instantiated with HF.

📄 Notice

Mixtral-8x7B is a pretrained base model and therefore does not have any moderation mechanisms.

👥 The Mistral AI Team

Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.

📄 License

Apache-2.0

🌐 Language Support

French
Italian
German
Spanish
English

🏷️ Tags

⚠️ Privacy Notice

If you want to learn more about how we process your personal data, please read our Privacy Policy.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Mixtral 8x7B V0.1

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Mixtral-8x7B Model Card

🚀 Quick Start

💻 Usage Examples

Basic Usage

Advanced Usage

In half - precision

Lower precision using (8 - bit & 4 - bit) using bitsandbytes

Load the model with Flash Attention 2

⚠️ Important Note

📄 Notice

👥 The Mistral AI Team

📄 License

🌐 Language Support

🏷️ Tags

⚠️ Privacy Notice

Lower precision using (8 - bit & 4 - bit) using `bitsandbytes`