模型简介
模型特点
模型能力
使用案例
🚀 Beyonder-4x7B-v2
Beyonder-4x7B-v2 是一个使用 mergekit(mixtral 分支)创建的专家混合(Mixture of Experts, MoE)模型。它结合了多个基础模型的优势,在文本生成任务上表现出色,为用户提供更准确、更高效的文本生成服务。
🚀 快速开始
你可以使用以下代码在 Google Colab 上以 4 位精度运行此模型:
!pip install -qU transformers bitsandbytes accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/Beyonder-4x7B-v2"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)
messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
输出结果:
A Mixture of Experts (ME) is a machine learning technique that combines multiple expert models to make predictions or decisions. Each expert model is specialized in a different aspect of the problem, and their outputs are combined to produce a more accurate and robust solution. This approach allows the model to leverage the strengths of individual experts and compensate for their weaknesses, improving overall performance.
这里还有一个 notebook,可帮助你在 Google Colab 上使用免费的 T4 GPU 以 4 位精度运行此模型。
✨ 主要特性
- 专家混合架构:结合多个基础模型的优势,在文本生成任务上表现出色。
- 多场景适用:适用于多种文本生成场景,如问答、代码生成、故事创作等。
- 高效推理:推荐上下文长度为 8k,能够在保证性能的同时提高推理效率。
📦 量化模型
感谢 TheBloke 提供的量化模型:
- GGUF:https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GGUF
- AWQ:https://huggingface.co/TheBloke/Beyonder-4x7B-v2-AWQ
- GPTQ:https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GPTQ
- EXL2:https://huggingface.co/bartowski/Beyonder-4x7B-v2-exl2
🏆 评估
与 Mixtral-8x7B-Instruct-v0.1 对比
Beyonder-4x7B-v2 在 Open LLM Leaderboard 上与 Mixtral-8x7B-Instruct-v0.1 具有竞争力,且仅使用 4 个专家模型而非 8 个。
与单个专家模型对比
它相较于单个专家模型有显著提升。
Nous 基准测试套件
在 Nous 基准测试套件中,与其他模型相比表现出色,几乎与最佳的 Yi-34B 微调模型相当,而后者是一个更大的模型:242 亿参数 + 推理时仅选择两个专家(约 120 亿),而 Beyonder-4x7B-v2 为 340 亿参数。
模型 | AGIEval | GPT4All | TruthfulQA | Bigbench | 平均 |
---|---|---|---|---|---|
Beyonder-4x7B-v2 | 45.29 | 75.95 | 60.86 | 46.4 | 57.13 |
NeuralHermes-2.5-Mistral-7B | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 |
OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42 |
Nous-Hermes-2-SOLAR-10.7B | 47.79 | 74.69 | 55.92 | 44.84 | 55.81 |
Nous-Hermes-2-Yi-34B | 50.27 | 76.00 | 60.34 | 46.69 | 58.33 |
各任务详细评估
AGIEval
任务 | 版本 | 指标 | 值 | 标准误差 | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 23.62 | ± | 2.67 |
acc_norm | 23.62 | ± | 2.67 | ||
agieval_logiqa_en | 0 | acc | 41.47 | ± | 1.93 |
acc_norm | 43.01 | ± | 1.94 | ||
agieval_lsat_ar | 0 | acc | 23.04 | ± | 2.78 |
acc_norm | 23.48 | ± | 2.80 | ||
agieval_lsat_lr | 0 | acc | 51.57 | ± | 2.22 |
acc_norm | 52.94 | ± | 2.21 | ||
agieval_lsat_rc | 0 | acc | 64.31 | ± | 2.93 |
acc_norm | 64.68 | ± | 2.92 | ||
agieval_sat_en | 0 | acc | 79.13 | ± | 2.84 |
acc_norm | 79.13 | ± | 2.84 | ||
agieval_sat_en_without_passage | 0 | acc | 43.20 | ± | 3.46 |
acc_norm | 43.20 | ± | 3.46 | ||
agieval_sat_math | 0 | acc | 34.55 | ± | 3.21 |
acc_norm | 32.27 | ± | 3.16 |
GPT4All
任务 | 版本 | 指标 | 值 | 标准误差 | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 61.86 | ± | 1.42 |
acc_norm | 64.51 | ± | 1.40 | ||
arc_easy | 0 | acc | 85.06 | ± | 0.73 |
acc_norm | 82.45 | ± | 0.78 | ||
boolq | 1 | acc | 88.35 | ± | 0.56 |
hellaswag | 0 | acc | 68.04 | ± | 0.47 |
acc_norm | 85.12 | ± | 0.36 | ||
openbookqa | 0 | acc | 37.80 | ± | 2.17 |
acc_norm | 48.60 | ± | 2.24 | ||
piqa | 0 | acc | 83.08 | ± | 0.87 |
acc_norm | 83.95 | ± | 0.86 | ||
winogrande | 0 | acc | 78.69 | ± | 1.15 |
TruthfulQA
任务 | 版本 | 指标 | 值 | 标准误差 | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 44.55 | ± | 1.74 |
mc2 | 60.86 | ± | 1.57 |
Bigbench
任务 | 版本 | 指标 | 值 | 标准误差 | |
---|---|---|---|---|---|
bigbench_causal_judgement | 0 | multiple_choice_grade | 58.95 | ± | 3.58 |
bigbench_date_understanding | 0 | multiple_choice_grade | 66.40 | ± | 2.46 |
bigbench_disambiguation_qa | 0 | multiple_choice_grade | 48.84 | ± | 3.12 |
bigbench_geometric_shapes | 0 | multiple_choice_grade | 22.56 | ± | 2.21 |
exact_str_match | 13.37 | ± | 1.80 | ||
bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 30.40 | ± | 2.06 |
bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 20.57 | ± | 1.53 |
bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 52.00 | ± | 2.89 |
bigbench_movie_recommendation | 0 | multiple_choice_grade | 44.40 | ± | 2.22 |
bigbench_navigate | 0 | multiple_choice_grade | 52.10 | ± | 1.58 |
bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 69.75 | ± | 1.03 |
bigbench_ruin_names | 0 | multiple_choice_grade | 55.36 | ± | 2.35 |
bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 23.65 | ± | 1.35 |
bigbench_snarks | 0 | multiple_choice_grade | 77.35 | ± | 3.12 |
bigbench_sports_understanding | 0 | multiple_choice_grade | 73.02 | ± | 1.41 |
bigbench_temporal_sequences | 0 | multiple_choice_grade | 46.80 | ± | 1.58 |
bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 22.08 | ± | 1.17 |
bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 19.03 | ± | 0.94 |
bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 52.00 | ± | 2.89 |
🧩 配置
base_model: mlabonne/Marcoro14-7B-slerp
experts:
- source_model: openchat/openchat-3.5-1210
positive_prompts:
- "chat"
- "assistant"
- "tell me"
- "explain"
- source_model: beowolx/CodeNinja-1.0-OpenChat-7B
positive_prompts:
- "code"
- "python"
- "javascript"
- "programming"
- "algorithm"
- source_model: maywell/PiVoT-0.1-Starling-LM-RP
positive_prompts:
- "storywriting"
- "write"
- "scene"
- "story"
- "character"
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "reason"
- "math"
- "mathematics"
- "solve"
- "count"
📄 许可证
本模型使用的许可证为 microsoft-research-license,详情请见:https://huggingface.co/WizardLM/WizardMath-7B-V1.1/resolve/main/LICENSE



