Beyonder-4x7B-v2開源大語言模型 - 支持對話、編程等多領域任務

首頁

Beyonder 4x7B V2

由mlabonne開發

Beyonder-4x7B-v2是一個基於混合專家模型（MoE）架構的大語言模型，由4個專家模塊組成，專注於不同領域的任務，如對話、編程、創意寫作和數學推理。

大型語言模型

Transformers

開源協議:其他 #混合專家系統 #多任務優化 #高性能文本生成

下載量 758

發布時間 : 1/5/2024

模型概述

Beyonder-4x7B-v2是一個高性能的混合專家模型，結合了多個專家模塊的優勢，適用於多種文本生成任務，包括對話、編程、創意寫作和數學推理。

模型特點

混合專家架構

結合4個專家模塊，每個模塊專注於不同領域的任務，提升整體性能。

高性能

在多個基準測試中表現優異，接近更大規模的模型性能。

多領域適用

適用於對話、編程、創意寫作和數學推理等多種任務。

量化支持

提供多種量化版本（GGUF、AWQ、GPTQ、EXL2），便於不同硬件環境部署。

模型能力

文本生成

對話系統

編程輔助

創意寫作

數學推理

使用案例

教育

數學問題解答

幫助學生解答覆雜的數學問題，提供詳細的推理過程。

在GSM8k數據集上準確率達到71.72%。

編程

代碼生成與優化

生成高質量的代碼片段，並提供優化建議。

在編程相關任務中表現優異。

創意寫作

故事生成

生成富有創意的故事和情節。

在創意寫作任務中表現突出。

🚀 Beyonder-4x7B-v2

Beyonder-4x7B-v2 是一個使用 mergekit（mixtral 分支）創建的專家混合（Mixture of Experts, MoE）模型。它結合了多個基礎模型的優勢，在文本生成任務上表現出色，為用戶提供更準確、更高效的文本生成服務。

模型圖片

🚀 快速開始

你可以使用以下代碼在 Google Colab 上以 4 位精度運行此模型：

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/Beyonder-4x7B-v2"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

輸出結果：

A Mixture of Experts (ME) is a machine learning technique that combines multiple expert models to make predictions or decisions. Each expert model is specialized in a different aspect of the problem, and their outputs are combined to produce a more accurate and robust solution. This approach allows the model to leverage the strengths of individual experts and compensate for their weaknesses, improving overall performance.

這裡還有一個 notebook，可幫助你在 Google Colab 上使用免費的 T4 GPU 以 4 位精度運行此模型。

✨ 主要特性

專家混合架構：結合多個基礎模型的優勢，在文本生成任務上表現出色。
多場景適用：適用於多種文本生成場景，如問答、代碼生成、故事創作等。
高效推理：推薦上下文長度為 8k，能夠在保證性能的同時提高推理效率。

📦 量化模型

感謝 TheBloke 提供的量化模型：

GGUF：https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GGUF
AWQ：https://huggingface.co/TheBloke/Beyonder-4x7B-v2-AWQ
GPTQ：https://huggingface.co/TheBloke/Beyonder-4x7B-v2-GPTQ
EXL2：https://huggingface.co/bartowski/Beyonder-4x7B-v2-exl2

🏆 評估

與 Mixtral-8x7B-Instruct-v0.1 對比

Beyonder-4x7B-v2 在 Open LLM Leaderboard 上與 Mixtral-8x7B-Instruct-v0.1 具有競爭力，且僅使用 4 個專家模型而非 8 個。與 Mixtral-8x7B-Instruct-v0.1 對比

與單個專家模型對比

它相較於單個專家模型有顯著提升。與單個專家模型對比

Nous 基準測試套件

在 Nous 基準測試套件中，與其他模型相比表現出色，幾乎與最佳的 Yi-34B 微調模型相當，而後者是一個更大的模型：242 億參數 + 推理時僅選擇兩個專家（約 120 億），而 Beyonder-4x7B-v2 為 340 億參數。

模型	AGIEval	GPT4All	TruthfulQA	Bigbench	平均
Beyonder-4x7B-v2	45.29	75.95	60.86	46.4	57.13
NeuralHermes-2.5-Mistral-7B	43.67	73.24	55.37	41.76	53.51
OpenHermes-2.5-Mistral-7B	42.75	72.99	52.99	40.94	52.42
Nous-Hermes-2-SOLAR-10.7B	47.79	74.69	55.92	44.84	55.81
Nous-Hermes-2-Yi-34B	50.27	76.00	60.34	46.69	58.33

各任務詳細評估

AGIEval

任務	版本	指標	值		標準誤差
agieval_aqua_rat	0	acc	23.62	±	2.67
		acc_norm	23.62	±	2.67
agieval_logiqa_en	0	acc	41.47	±	1.93
		acc_norm	43.01	±	1.94
agieval_lsat_ar	0	acc	23.04	±	2.78
		acc_norm	23.48	±	2.80
agieval_lsat_lr	0	acc	51.57	±	2.22
		acc_norm	52.94	±	2.21
agieval_lsat_rc	0	acc	64.31	±	2.93
		acc_norm	64.68	±	2.92
agieval_sat_en	0	acc	79.13	±	2.84
		acc_norm	79.13	±	2.84
agieval_sat_en_without_passage	0	acc	43.20	±	3.46
		acc_norm	43.20	±	3.46
agieval_sat_math	0	acc	34.55	±	3.21
		acc_norm	32.27	±	3.16

GPT4All

任務	版本	指標	值		標準誤差
arc_challenge	0	acc	61.86	±	1.42
		acc_norm	64.51	±	1.40
arc_easy	0	acc	85.06	±	0.73
		acc_norm	82.45	±	0.78
boolq	1	acc	88.35	±	0.56
hellaswag	0	acc	68.04	±	0.47
		acc_norm	85.12	±	0.36
openbookqa	0	acc	37.80	±	2.17
		acc_norm	48.60	±	2.24
piqa	0	acc	83.08	±	0.87
		acc_norm	83.95	±	0.86
winogrande	0	acc	78.69	±	1.15

TruthfulQA

任務	版本	指標	值		標準誤差
truthfulqa_mc	1	mc1	44.55	±	1.74
		mc2	60.86	±	1.57

Bigbench

任務	版本	指標	值		標準誤差
bigbench_causal_judgement	0	multiple_choice_grade	58.95	±	3.58
bigbench_date_understanding	0	multiple_choice_grade	66.40	±	2.46
bigbench_disambiguation_qa	0	multiple_choice_grade	48.84	±	3.12
bigbench_geometric_shapes	0	multiple_choice_grade	22.56	±	2.21
		exact_str_match	13.37	±	1.80
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	30.40	±	2.06
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	20.57	±	1.53
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	52.00	±	2.89
bigbench_movie_recommendation	0	multiple_choice_grade	44.40	±	2.22
bigbench_navigate	0	multiple_choice_grade	52.10	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	69.75	±	1.03
bigbench_ruin_names	0	multiple_choice_grade	55.36	±	2.35
bigbench_salient_translation_error_detection	0	multiple_choice_grade	23.65	±	1.35
bigbench_snarks	0	multiple_choice_grade	77.35	±	3.12
bigbench_sports_understanding	0	multiple_choice_grade	73.02	±	1.41
bigbench_temporal_sequences	0	multiple_choice_grade	46.80	±	1.58
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	22.08	±	1.17
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	19.03	±	0.94
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	52.00	±	2.89

🧩 配置

base_model: mlabonne/Marcoro14-7B-slerp
experts:
  - source_model: openchat/openchat-3.5-1210
    positive_prompts:
    - "chat"
    - "assistant"
    - "tell me"
    - "explain"
  - source_model: beowolx/CodeNinja-1.0-OpenChat-7B
    positive_prompts:
    - "code"
    - "python"
    - "javascript"
    - "programming"
    - "algorithm"
  - source_model: maywell/PiVoT-0.1-Starling-LM-RP
    positive_prompts:
    - "storywriting"
    - "write"
    - "scene"
    - "story"
    - "character"
  - source_model: WizardLM/WizardMath-7B-V1.1
    positive_prompts:
    - "reason"
    - "math"
    - "mathematics"
    - "solve"
    - "count"