🚀 CosmicBun-8B
CosmicBun-8B 是一个通过合并预训练语言模型得到的模型,它结合了多个优秀模型的优势,在文本生成任务上表现出色,能为用户提供更准确、丰富的文本输出。
📄 许可证
本项目采用 MIT 许可证。
✨ 主要特性
- 多模型融合:融合了多个预训练语言模型的优势,包括
cognitivecomputations/dolphin-2.9-llama3-8b
、Weyaxi/Einstein-v6.1-Llama3-8B
和 Locutusque/llama-3-neural-chat-v1-8b
。
- 先进合并方法:使用 DARE TIES 合并方法,以
Locutusque/llama-3-neural-chat-v1-8b
为基础进行合并。
- 多任务表现良好:在多个文本生成任务的数据集上取得了不错的成绩,如 AI2 Reasoning Challenge、HellaSwag、MMLU 等。
🔧 技术细节
合并方法
本模型使用 DARE TIES 合并方法,以 Locutusque/llama-3-neural-chat-v1-8b 为基础进行合并。
合并的模型
以下模型参与了合并:
配置
以下是用于生成此模型的 YAML 配置:
base_model: Locutusque/llama-3-neural-chat-v1-8b
dtype: bfloat16
merge_method: dare_ties
parameters:
int8_mask: 1.0
normalize: 0.0
slices:
- sources:
- layer_range: [0, 4]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.6
- layer_range: [0, 4]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.6
weight: 0.5
- layer_range: [0, 4]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.5
- sources:
- layer_range: [4, 8]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.8
weight: 0.1
- layer_range: [4, 8]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 1.0
weight: 0.2
- layer_range: [4, 8]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.7
- sources:
- layer_range: [8, 12]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.7
weight: 0.1
- layer_range: [8, 12]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.7
weight: 0.2
- layer_range: [8, 12]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.7
weight: 0.6
- sources:
- layer_range: [12, 16]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.9
weight: 0.2
- layer_range: [12, 16]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.6
weight: 0.6
- layer_range: [12, 16]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.7
weight: 0.3
- sources:
- layer_range: [16, 20]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.2
- layer_range: [16, 20]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 1.0
weight: 0.2
- layer_range: [16, 20]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.9
weight: 0.4
- sources:
- layer_range: [20, 24]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.7
weight: 0.2
- layer_range: [20, 24]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.9
weight: 0.3
- layer_range: [20, 24]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.4
- sources:
- layer_range: [24, 28]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.4
- layer_range: [24, 28]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.8
weight: 0.2
- layer_range: [24, 28]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 0.9
weight: 0.4
- sources:
- layer_range: [28, 32]
model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 1.0
weight: 0.3
- layer_range: [28, 32]
model: Weyaxi/Einstein-v6.1-Llama3-8B
parameters:
density: 0.9
weight: 0.2
- layer_range: [28, 32]
model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
density: 1.0
weight: 0.3
📚 详细文档
详细结果可查看 此处
指标 |
值 |
平均值 |
68.81 |
AI2 推理挑战 (25 次少样本学习) |
61.86 |
HellaSwag (10 次少样本学习) |
84.29 |
MMLU (5 次少样本学习) |
65.53 |
TruthfulQA (0 次少样本学习) |
54.08 |
Winogrande (5 次少样本学习) |
78.85 |
GSM8k (5 次少样本学习) |
68.23 |