🚀 Beyonder-4x7B-v2
Beyonder-4x7B-v2 is a Mixture of Experts (MoE) model created with mergekit, leveraging multiple base models to offer excellent text - generation capabilities.

This model is a Mixture of Experts (MoE) made with mergekit (mixtral branch). It uses the following base models:
The recommended context length is 8k.
✨ Features
- Mixture of Experts Architecture: Combines the strengths of multiple base models for enhanced performance.
- Quantized Models Available: Thanks to TheBloke, various quantized versions are accessible for different usage scenarios.
- Competitive Performance: Performs competitively with larger models on the Open LLM Leaderboard and shows significant improvement over individual experts.
📦 Installation
No specific installation steps are provided in the original README. If you want to use the model, you can refer to the usage example below.
💻 Usage Examples
Basic Usage
Here's a notebook to run this model in 4 - bit precision using a free T4 GPU on Google Colab.
!pip install -qU transformers bitsandbytes accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/Beyonder-4x7B-v2"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)
messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Output:
A Mixture of Experts (ME) is a machine learning technique that combines multiple expert models to make predictions or decisions. Each expert model is specialized in a different aspect of the problem, and their outputs are combined to produce a more accurate and robust solution. This approach allows the model to leverage the strengths of individual experts and compensate for their weaknesses, improving overall performance.
🏆 Evaluation
Beyonder-4x7B-v2 is competitive with Mixtral-8x7B-Instruct-v0.1 on the Open LLM Leaderboard, while only having 4 experts instead of 8.

It also displays a significant improvement over the individual experts.

It also performs very well compared to other models on Nous benchmark suite. It's almost as good as the best Yi - 34B fine - tune, which is a much bigger model: 24.2B parameters + only two experts are selected during inference (so ~12B) vs. 34B param.
AGIEval
Task |
Version |
Metric |
Value |
|
Stderr |
agieval_aqua_rat |
0 |
acc |
23.62 |
± |
2.67 |
|
|
acc_norm |
23.62 |
± |
2.67 |
agieval_logiqa_en |
0 |
acc |
41.47 |
± |
1.93 |
|
|
acc_norm |
43.01 |
± |
1.94 |
agieval_lsat_ar |
0 |
acc |
23.04 |
± |
2.78 |
|
|
acc_norm |
23.48 |
± |
2.80 |
agieval_lsat_lr |
0 |
acc |
51.57 |
± |
2.22 |
|
|
acc_norm |
52.94 |
± |
2.21 |
agieval_lsat_rc |
0 |
acc |
64.31 |
± |
2.93 |
|
|
acc_norm |
64.68 |
± |
2.92 |
agieval_sat_en |
0 |
acc |
79.13 |
± |
2.84 |
|
|
acc_norm |
79.13 |
± |
2.84 |
agieval_sat_en_without_passage |
0 |
acc |
43.20 |
± |
3.46 |
|
|
acc_norm |
43.20 |
± |
3.46 |
agieval_sat_math |
0 |
acc |
34.55 |
± |
3.21 |
|
|
acc_norm |
32.27 |
± |
3.16 |
GPT4All
Task |
Version |
Metric |
Value |
|
Stderr |
arc_challenge |
0 |
acc |
61.86 |
± |
1.42 |
|
|
acc_norm |
64.51 |
± |
1.40 |
arc_easy |
0 |
acc |
85.06 |
± |
0.73 |
|
|
acc_norm |
82.45 |
± |
0.78 |
boolq |
1 |
acc |
88.35 |
± |
0.56 |
hellaswag |
0 |
acc |
68.04 |
± |
0.47 |
|
|
acc_norm |
85.12 |
± |
0.36 |
openbookqa |
0 |
acc |
37.80 |
± |
2.17 |
|
|
acc_norm |
48.60 |
± |
2.24 |
piqa |
0 |
acc |
83.08 |
± |
0.87 |
|
|
acc_norm |
83.95 |
± |
0.86 |
winogrande |
0 |
acc |
78.69 |
± |
1.15 |
TruthfulQA
Task |
Version |
Metric |
Value |
|
Stderr |
truthfulqa_mc |
1 |
mc1 |
44.55 |
± |
1.74 |
|
|
mc2 |
60.86 |
± |
1.57 |
Bigbench
Task |
Version |
Metric |
Value |
|
Stderr |
bigbench_causal_judgement |
0 |
multiple_choice_grade |
58.95 |
± |
3.58 |
bigbench_date_understanding |
0 |
multiple_choice_grade |
66.40 |
± |
2.46 |
bigbench_disambiguation_qa |
0 |
multiple_choice_grade |
48.84 |
± |
3.12 |
bigbench_geometric_shapes |
0 |
multiple_choice_grade |
22.56 |
± |
2.21 |
|
|
exact_str_match |
13.37 |
± |
1.80 |
bigbench_logical_deduction_five_objects |
0 |
multiple_choice_grade |
30.40 |
± |
2.06 |
bigbench_logical_deduction_seven_objects |
0 |
multiple_choice_grade |
20.57 |
± |
1.53 |
bigbench_logical_deduction_three_objects |
0 |
multiple_choice_grade |
52.00 |
± |
2.89 |
bigbench_movie_recommendation |
0 |
multiple_choice_grade |
44.40 |
± |
2.22 |
bigbench_navigate |
0 |
multiple_choice_grade |
52.10 |
± |
1.58 |
bigbench_reasoning_about_colored_objects |
0 |
multiple_choice_grade |
69.75 |
± |
1.03 |
bigbench_ruin_names |
0 |
multiple_choice_grade |
55.36 |
± |
2.35 |
bigbench_salient_translation_error_detection |
0 |
multiple_choice_grade |
23.65 |
± |
1.35 |
bigbench_snarks |
0 |
multiple_choice_grade |
77.35 |
± |
3.12 |
bigbench_sports_understanding |
0 |
multiple_choice_grade |
73.02 |
± |
1.41 |
bigbench_temporal_sequences |
0 |
multiple_choice_grade |
46.80 |
± |
1.58 |
bigbench_tracking_shuffled_objects_five_objects |
0 |
multiple_choice_grade |
22.08 |
± |
1.17 |
bigbench_tracking_shuffled_objects_seven_objects |
0 |
multiple_choice_grade |
19.03 |
± |
0.94 |
bigbench_tracking_shuffled_objects_three_objects |
0 |
multiple_choice_grade |
52.00 |
± |
2.89 |
🧩 Configuration
base_model: mlabonne/Marcoro14-7B-slerp
experts:
- source_model: openchat/openchat-3.5-1210
positive_prompts:
- "chat"
- "assistant"
- "tell me"
- "explain"
- source_model: beowolx/CodeNinja-1.0-OpenChat-7B
positive_prompts:
- "code"
- "python"
- "javascript"
- "programming"
- "algorithm"
- source_model: maywell/PiVoT-0.1-Starling-LM-RP
positive_prompts:
- "storywriting"
- "write"
- "scene"
- "story"
- "character"
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "reason"
- "math"
- "mathematics"
- "solve"
- "count"
⚡ Quantized models
Thanks to TheBloke for the quantized models:
- GGUF: https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GGUF
- AWQ: https://huggingface.co/TheBloke/Beyonder-4x7B-v2-AWQ
- GPTQ: https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GPTQ
- EXL2: https://huggingface.co/bartowski/Beyonder-4x7B-v2-exl2
📄 License
This model is under the microsoft - research - license.