đ phixtral-2x2_8
phixtral-2x2_8 is the first Mixture of Experts (MoE) created with two microsoft/phi-2 models. Inspired by the mistralai/Mixtral-8x7B-v0.1 architecture, it outperforms each individual expert.
You can try it out using this Space.
đ Quick Start
You can quickly start using phixtral-2x2_8 through the provided Space or follow the usage examples below.
⨠Features
- Mixture of Experts Architecture: Built with two microsoft/phi-2 models, it performs better than individual experts.
- Evaluated Performance: Demonstrates excellent performance in multiple evaluation benchmarks.
đĻ Installation
The installation process can be completed by running the following command in the provided Colab notebook:
!pip install -q --upgrade transformers einops accelerate bitsandbytes
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "phixtral-2x2_8"
instruction = '''
def print_prime(n):
"""
Print all primes between 1 and n
"""
'''
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained(
f"mlabonne/{model_name}",
torch_dtype="auto",
load_in_4bit=True,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
f"mlabonne/{model_name}",
trust_remote_code=True
)
inputs = tokenizer(
instruction,
return_tensors="pt",
return_attention_mask=False
)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)
Advanced Usage
Inspired by mistralai/Mixtral-8x7B-v0.1, you can specify the num_experts_per_tok
and num_local_experts
in the config.json
file (2 for both by default). This configuration is automatically loaded in configuration.py
.
đ Evaluation
The evaluation was performed using LLM AutoEval on Nous suite.
Check YALL - Yet Another LLM Leaderboard to compare it with other models.
đ§Š Configuration
The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:
base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
- source_model: cognitivecomputations/dolphin-2_6-phi-2
positive_prompts: [""]
- source_model: lxuechen/phi-2-dpo
positive_prompts: [""]
đ¤ Acknowledgments
A special thanks to vince62s for the inference code and the dynamic configuration of the number of experts. He was very patient and helped me to debug everything.
Thanks to Charles Goddard for the mergekit library and the implementation of the MoE for clowns.
Thanks to ehartford and lxuechen for their fine-tuned phi-2 models.
đ License
This project is licensed under the MIT license. For more details, please refer to LICENSE.