Beyonder-4x7B-v2开源大语言模型 - 支持对话、编程等多领域任务

首页

Beyonder 4x7B V2

由 mlabonne 开发

Beyonder-4x7B-v2是一个基于混合专家模型（MoE）架构的大语言模型，由4个专家模块组成，专注于不同领域的任务，如对话、编程、创意写作和数学推理。

大型语言模型

Transformers

开源协议:其他 #混合专家系统 #多任务优化 #高性能文本生成

下载量 758

发布时间 : 1/5/2024

模型简介

Beyonder-4x7B-v2是一个高性能的混合专家模型，结合了多个专家模块的优势，适用于多种文本生成任务，包括对话、编程、创意写作和数学推理。

模型特点

混合专家架构

结合4个专家模块，每个模块专注于不同领域的任务，提升整体性能。

高性能

在多个基准测试中表现优异，接近更大规模的模型性能。

多领域适用

适用于对话、编程、创意写作和数学推理等多种任务。

量化支持

提供多种量化版本（GGUF、AWQ、GPTQ、EXL2），便于不同硬件环境部署。

模型能力

文本生成

对话系统

编程辅助

创意写作

数学推理

使用案例

教育

数学问题解答

帮助学生解答复杂的数学问题，提供详细的推理过程。

在GSM8k数据集上准确率达到71.72%。

编程

代码生成与优化

生成高质量的代码片段，并提供优化建议。

在编程相关任务中表现优异。

创意写作

故事生成

生成富有创意的故事和情节。

在创意写作任务中表现突出。

🚀 Beyonder-4x7B-v2

Beyonder-4x7B-v2 是一个使用 mergekit（mixtral 分支）创建的专家混合（Mixture of Experts, MoE）模型。它结合了多个基础模型的优势，在文本生成任务上表现出色，为用户提供更准确、更高效的文本生成服务。

模型图片

🚀 快速开始

你可以使用以下代码在 Google Colab 上以 4 位精度运行此模型：

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/Beyonder-4x7B-v2"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

输出结果：

A Mixture of Experts (ME) is a machine learning technique that combines multiple expert models to make predictions or decisions. Each expert model is specialized in a different aspect of the problem, and their outputs are combined to produce a more accurate and robust solution. This approach allows the model to leverage the strengths of individual experts and compensate for their weaknesses, improving overall performance.

这里还有一个 notebook，可帮助你在 Google Colab 上使用免费的 T4 GPU 以 4 位精度运行此模型。

✨ 主要特性

专家混合架构：结合多个基础模型的优势，在文本生成任务上表现出色。
多场景适用：适用于多种文本生成场景，如问答、代码生成、故事创作等。
高效推理：推荐上下文长度为 8k，能够在保证性能的同时提高推理效率。

📦 量化模型

感谢 TheBloke 提供的量化模型：

GGUF：https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GGUF
AWQ：https://huggingface.co/TheBloke/Beyonder-4x7B-v2-AWQ
GPTQ：https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GPTQ
EXL2：https://huggingface.co/bartowski/Beyonder-4x7B-v2-exl2

🏆 评估

与 Mixtral-8x7B-Instruct-v0.1 对比

Beyonder-4x7B-v2 在 Open LLM Leaderboard 上与 Mixtral-8x7B-Instruct-v0.1 具有竞争力，且仅使用 4 个专家模型而非 8 个。与 Mixtral-8x7B-Instruct-v0.1 对比

与单个专家模型对比

它相较于单个专家模型有显著提升。与单个专家模型对比

Nous 基准测试套件

在 Nous 基准测试套件中，与其他模型相比表现出色，几乎与最佳的 Yi-34B 微调模型相当，而后者是一个更大的模型：242 亿参数 + 推理时仅选择两个专家（约 120 亿），而 Beyonder-4x7B-v2 为 340 亿参数。

模型	AGIEval	GPT4All	TruthfulQA	Bigbench	平均
Beyonder-4x7B-v2	45.29	75.95	60.86	46.4	57.13
NeuralHermes-2.5-Mistral-7B	43.67	73.24	55.37	41.76	53.51
OpenHermes-2.5-Mistral-7B	42.75	72.99	52.99	40.94	52.42
Nous-Hermes-2-SOLAR-10.7B	47.79	74.69	55.92	44.84	55.81
Nous-Hermes-2-Yi-34B	50.27	76.00	60.34	46.69	58.33

各任务详细评估

AGIEval

任务	版本	指标	值		标准误差
agieval_aqua_rat	0	acc	23.62	±	2.67
		acc_norm	23.62	±	2.67
agieval_logiqa_en	0	acc	41.47	±	1.93
		acc_norm	43.01	±	1.94
agieval_lsat_ar	0	acc	23.04	±	2.78
		acc_norm	23.48	±	2.80
agieval_lsat_lr	0	acc	51.57	±	2.22
		acc_norm	52.94	±	2.21
agieval_lsat_rc	0	acc	64.31	±	2.93
		acc_norm	64.68	±	2.92
agieval_sat_en	0	acc	79.13	±	2.84
		acc_norm	79.13	±	2.84
agieval_sat_en_without_passage	0	acc	43.20	±	3.46
		acc_norm	43.20	±	3.46
agieval_sat_math	0	acc	34.55	±	3.21
		acc_norm	32.27	±	3.16

GPT4All

任务	版本	指标	值		标准误差
arc_challenge	0	acc	61.86	±	1.42
		acc_norm	64.51	±	1.40
arc_easy	0	acc	85.06	±	0.73
		acc_norm	82.45	±	0.78
boolq	1	acc	88.35	±	0.56
hellaswag	0	acc	68.04	±	0.47
		acc_norm	85.12	±	0.36
openbookqa	0	acc	37.80	±	2.17
		acc_norm	48.60	±	2.24
piqa	0	acc	83.08	±	0.87
		acc_norm	83.95	±	0.86
winogrande	0	acc	78.69	±	1.15

TruthfulQA

任务	版本	指标	值		标准误差
truthfulqa_mc	1	mc1	44.55	±	1.74
		mc2	60.86	±	1.57

Bigbench

任务	版本	指标	值		标准误差
bigbench_causal_judgement	0	multiple_choice_grade	58.95	±	3.58
bigbench_date_understanding	0	multiple_choice_grade	66.40	±	2.46
bigbench_disambiguation_qa	0	multiple_choice_grade	48.84	±	3.12
bigbench_geometric_shapes	0	multiple_choice_grade	22.56	±	2.21
		exact_str_match	13.37	±	1.80
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	30.40	±	2.06
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	20.57	±	1.53
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	52.00	±	2.89
bigbench_movie_recommendation	0	multiple_choice_grade	44.40	±	2.22
bigbench_navigate	0	multiple_choice_grade	52.10	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	69.75	±	1.03
bigbench_ruin_names	0	multiple_choice_grade	55.36	±	2.35
bigbench_salient_translation_error_detection	0	multiple_choice_grade	23.65	±	1.35
bigbench_snarks	0	multiple_choice_grade	77.35	±	3.12
bigbench_sports_understanding	0	multiple_choice_grade	73.02	±	1.41
bigbench_temporal_sequences	0	multiple_choice_grade	46.80	±	1.58
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	22.08	±	1.17
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	19.03	±	0.94
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	52.00	±	2.89

🧩 配置

base_model: mlabonne/Marcoro14-7B-slerp
experts:
  - source_model: openchat/openchat-3.5-1210
    positive_prompts:
    - "chat"
    - "assistant"
    - "tell me"
    - "explain"
  - source_model: beowolx/CodeNinja-1.0-OpenChat-7B
    positive_prompts:
    - "code"
    - "python"
    - "javascript"
    - "programming"
    - "algorithm"
  - source_model: maywell/PiVoT-0.1-Starling-LM-RP
    positive_prompts:
    - "storywriting"
    - "write"
    - "scene"
    - "story"
    - "character"
  - source_model: WizardLM/WizardMath-7B-V1.1
    positive_prompts:
    - "reason"
    - "math"
    - "mathematics"
    - "solve"
    - "count"