Phixtral-2x2_8 Open-source AI Model - Combining the Advantages of Two Models, Outperforming Single Expert Models

Phixtral 2x2 8

Developed by mlabonne

phixtral-2x2_8 is the first Mixture of Experts (MoE) model built upon two microsoft/phi-2 models, outperforming each individual expert model.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Mixture of Experts Model #Code Generation Optimization #Lightweight Inference

Downloads 178

Release Time : 1/7/2024

Model Overview

phixtral-2x2_8 is a Mixture of Experts (MoE) model built upon two microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture. By combining the strengths of two expert models, it delivers superior performance.

Model Features

Mixture of Experts (MoE)

Combines the strengths of two microsoft/phi-2 models to deliver better performance.

High Performance

Outperforms individual expert models in tests such as AGIEval, GPT4All, TruthfulQA, and Bigbench.

Flexible Configuration

Supports dynamic configuration of the number of experts to adapt to different task requirements.

Model Capabilities

Text Generation

Code Generation

Natural Language Processing

Use Cases

Code Generation

Prime Number Code Generation

Generates Python code to print all prime numbers between 1 and n based on input.

Produces high-quality code snippets ready for direct use in development.

Natural Language Processing

Text Generation

Generates coherent text content based on input prompts.

Produces fluent and logically clear text.

🚀 phixtral-2x2_8

phixtral-2x2_8 is the first Mixture of Experts (MoE) created with two microsoft/phi-2 models. Inspired by the mistralai/Mixtral-8x7B-v0.1 architecture, it outperforms each individual expert.

You can try it out using this Space.

🚀 Quick Start

You can quickly start using phixtral-2x2_8 through the provided Space or follow the usage examples below.

✨ Features

Mixture of Experts Architecture: Built with two microsoft/phi-2 models, it performs better than individual experts.
Evaluated Performance: Demonstrates excellent performance in multiple evaluation benchmarks.

📦 Installation

The installation process can be completed by running the following command in the provided Colab notebook:

!pip install -q --upgrade transformers einops accelerate bitsandbytes

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "phixtral-2x2_8"
instruction = '''
    def print_prime(n):
        """
        Print all primes between 1 and n
        """
'''

torch.set_default_device("cuda")

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    f"mlabonne/{model_name}", 
    torch_dtype="auto", 
    load_in_4bit=True, 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    f"mlabonne/{model_name}", 
    trust_remote_code=True
)

# Tokenize the input string
inputs = tokenizer(
    instruction, 
    return_tensors="pt", 
    return_attention_mask=False
)

# Generate text using the model
outputs = model.generate(**inputs, max_length=200)

# Decode and print the output
text = tokenizer.batch_decode(outputs)[0]
print(text)

Advanced Usage

Inspired by mistralai/Mixtral-8x7B-v0.1, you can specify the num_experts_per_tok and num_local_experts in the config.json file (2 for both by default). This configuration is automatically loaded in configuration.py.

🏆 Evaluation

The evaluation was performed using LLM AutoEval on Nous suite.

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
phixtral-2x2_8	34.1	70.44	48.78	37.82	47.78
dolphin-2_6-phi-2	33.12	69.85	47.39	37.2	46.89
phi-2-dpo	30.39	71.68	50.75	34.9	46.93
phi-2	27.98	70.8	44.43	35.21	44.61

Check YALL - Yet Another LLM Leaderboard to compare it with other models.

🧩 Configuration

The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:

base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
  - source_model: cognitivecomputations/dolphin-2_6-phi-2
    positive_prompts: [""]
  - source_model: lxuechen/phi-2-dpo
    positive_prompts: [""]

🤝 Acknowledgments

A special thanks to vince62s for the inference code and the dynamic configuration of the number of experts. He was very patient and helped me to debug everything.

Thanks to Charles Goddard for the mergekit library and the implementation of the MoE for clowns.

Thanks to ehartford and lxuechen for their fine-tuned phi-2 models.

📄 License

This project is licensed under the MIT license. For more details, please refer to LICENSE.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご