🚀 CosmicBun-8B
CosmicBun-8B 是一個通過合併預訓練語言模型得到的模型,它結合了多個優秀模型的優勢,在文本生成任務上表現出色,能為用戶提供更準確、豐富的文本輸出。
📄 許可證
本項目採用 MIT 許可證。
✨ 主要特性
- 多模型融合:融合了多個預訓練語言模型的優勢,包括
cognitivecomputations/dolphin-2.9-llama3-8b
、Weyaxi/Einstein-v6.1-Llama3-8B
和 Locutusque/llama-3-neural-chat-v1-8b
。
- 先進合併方法:使用 DARE TIES 合併方法,以
Locutusque/llama-3-neural-chat-v1-8b
為基礎進行合併。
- 多任務表現良好:在多個文本生成任務的數據集上取得了不錯的成績,如 AI2 Reasoning Challenge、HellaSwag、MMLU 等。
🔧 技術細節
合併方法
本模型使用 DARE TIES 合併方法,以 Locutusque/llama-3-neural-chat-v1-8b 為基礎進行合併。
合併的模型
以下模型參與了合併:
配置
以下是用於生成此模型的 YAML 配置:
base_model: Locutusque/llama-3-neural-chat-v1-8b
dtype: bfloat16
merge_method: dare_ties
parameters:
int8_mask: 1.0
normalize: 0.0
slices:
- sources:
- layer_range: [0, 4]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.6
- layer_range: [0, 4]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.6
weight: 0.5
- layer_range: [0, 4]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.5
- sources:
- layer_range: [4, 8]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.8
weight: 0.1
- layer_range: [4, 8]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 1.0
weight: 0.2
- layer_range: [4, 8]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.7
- sources:
- layer_range: [8, 12]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.7
weight: 0.1
- layer_range: [8, 12]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.7
weight: 0.2
- layer_range: [8, 12]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.7
weight: 0.6
- sources:
- layer_range: [12, 16]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.9
weight: 0.2
- layer_range: [12, 16]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.6
weight: 0.6
- layer_range: [12, 16]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.7
weight: 0.3
- sources:
- layer_range: [16, 20]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.2
- layer_range: [16, 20]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 1.0
weight: 0.2
- layer_range: [16, 20]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.9
weight: 0.4
- sources:
- layer_range: [20, 24]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.7
weight: 0.2
- layer_range: [20, 24]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.9
weight: 0.3
- layer_range: [20, 24]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.4
- sources:
- layer_range: [24, 28]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.4
- layer_range: [24, 28]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.8
weight: 0.2
- layer_range: [24, 28]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.9
weight: 0.4
- sources:
- layer_range: [28, 32]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.3
- layer_range: [28, 32]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.9
weight: 0.2
- layer_range: [28, 32]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.3
📚 詳細文檔
詳細結果可查看 此處
指標 |
值 |
平均值 |
68.81 |
AI2 推理挑戰 (25 次少樣本學習) |
61.86 |
HellaSwag (10 次少樣本學習) |
84.29 |
MMLU (5 次少樣本學習) |
65.53 |
TruthfulQA (0 次少樣本學習) |
54.08 |
Winogrande (5 次少樣本學習) |
78.85 |
GSM8k (5 次少樣本學習) |
68.23 |